submitted by /u/YungMixtape2004
[link] [comments]
( 41
min )
submitted by /u/Microsis
[link] [comments]
( 41
min )
submitted by /u/redditguyjustinp
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/GamesAndGlasses
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/HamletsLastLine
[link] [comments]
( 42
min )
submitted by /u/justine01923
[link] [comments]
( 44
min )
submitted by /u/Past_Captain_9058
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Smaug117
[link] [comments]
( 41
min )
I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without any finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else):
https://preview.redd.it/fciatottq7oa1.png?width=1046&format=png&auto=webp&s=f88304a77b09e367e8b9812ba4b841e028481645
You are welcome to try it in RWKV 14B Gradio (click examples below the panel):
https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio
Tips: try "Expert Response" or "Expert Long Response" or "Expert Full Response" too.
https://preview.redd.it/qo71b85vq7oa1.png?width=2516&format=png&auto=webp&s=5d4467ba4bbc9016839760b3f3873f06c8b4bc6f
ChatRWKV v2 is now using a CUDA kernel to optimize INT8 inference (23 token/s on 3090): https://github.com/BlinkDL/ChatRWKV
Upgrade to latest code and "pip install rwkv --upgrade" to 0.5.0, and set os.environ["RWKV_CUDA_ON"] = '1' in v2/chat.py to enjoy the speed.
The inference speed (and VRAM consumption) of RWKV is independent of ctxlen, because it's an RNN (note: currently the preprocessing of a long prompt takes more VRAM but that can be optimized because we can process in chunks).
Meanwhile I find the latest RWKV-4-Pile-14B-20230313-ctx8192-test1050 model can utilize a long ctx:
https://preview.redd.it/a68dw0hzq7oa1.png?width=398&format=png&auto=webp&s=80570ccc844fa31efa1282d5b2106b9986e35b5a
submitted by /u/bo_peng
[link] [comments]
( 47
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can […]
( 9
min )
We tend to impute AI with human-like qualities. However, choosing to give your AI system a personality has its advantages and…
( 18
min )
As artificial intelligence (AI) continues to advance and become more pervasive in our daily lives, it is crucial that we consider the…
( 7
min )
Artificial Intelligence (AI) has transformed the way we live, work, and communicate, and it is now playing a significant role in the art…
( 7
min )
Image-to-image reconstruction problems with free or inexpensive metadata in
the form of class labels appear often in biological and medical image domains.
Existing text-guided or style-transfer image-to-image approaches do not
translate to datasets where additional information is provided as discrete
classes. We introduce and implement a model which combines image-to-image and
class-guided denoising diffusion probabilistic models. We train our model on a
real-world dataset of microscopy images used for drug discovery, with and
without incorporating metadata labels. By exploring the properties of
image-to-image diffusion with relevant labels, we show that class-guided
image-to-image diffusion can improve the meaningful content of the
reconstructed images and outperform the unguided model in useful downstream
tasks.
( 2
min )
Neural network approaches to approximate the ground state of quantum
hamiltonians require the numerical solution of a highly nonlinear optimization
problem. We introduce a statistical learning approach that makes the
optimization trivial by using kernel methods. Our scheme is an approximate
realization of the power method, where supervised learning is used to learn the
next step of the power iteration. We show that the ground state properties of
arbitrary gapped quantum hamiltonians can be reached with polynomial resources
under the assumption that the supervised learning is efficient. Using kernel
ridge regression, we provide numerical evidence that the learning assumption is
verified by applying our scheme to find the ground states of several
prototypical interacting many-body quantum systems, both in one and two
dimensions, showing the flexibility of our approach.
( 2
min )
Sequential decision making in the real world often requires finding a good
balance of conflicting objectives. In general, there exist a plethora of
Pareto-optimal policies that embody different patterns of compromises between
objectives, and it is technically challenging to obtain them exhaustively using
deep neural networks. In this work, we propose a novel multi-objective
reinforcement learning (MORL) algorithm that trains a single neural network via
policy gradient to approximately obtain the entire Pareto set in a single run
of training, without relying on linear scalarization of objectives. The
proposed method works in both continuous and discrete action spaces with no
design change of the policy network. Numerical experiments in benchmark
environments demonstrate the practicality and efficacy of our approach in
comparison to standard MORL baselines.
( 2
min )
Figuring out small molecule binding sites in target proteins, in the
resolution of either pocket or residue, is critical in many virtual and real
drug-discovery scenarios. Since it is not always easy to find such binding
sites based on domain knowledge or traditional methods, different deep learning
methods that predict binding sites out of protein structures have been
developed in recent years. Here we present a new such deep learning algorithm,
that significantly outperformed all state-of-the-art baselines in terms of the
both resolutions$\unicode{x2013}$pocket and residue. This good performance was
also demonstrated in a case study involving the protein human serum albumin and
its binding sites. Our algorithm included new ideas both in the model
architecture and in the training method. For the model architecture, it
incorporated SE(3)-invariant geometric self-attention layers that operate on
top of residue-level CNN outputs. This residue-level processing of the model
allowed a transfer learning between the two resolutions, which turned out to
significantly improve the binding pocket prediction. Moreover, we developed
novel augmentation method based on protein homology, which prevented our model
from over-fitting. Overall, we believe that our contribution to the literature
is twofold. First, we provided a new computational method for binding site
prediction that is relevant to real-world applications, as shown by the good
performance on different benchmarks and case study. Second, the novel ideas in
our method$\unicode{x2013}$the model architecture, transfer learning and the
homology augmentation$\unicode{x2013}$would serve as useful components in
future works.
( 3
min )
The secret’s out. Thanks to ChatGPT, everyone knows about the power of modern AI. To find out what’s coming next, tune in to NVIDIA founder and CEO Jensen Huang’s keynote address at NVIDIA GTC on Tuesday, March 21, at 8 a.m. Pacific. Huang will share his vision for the future of AI and how NVIDIA Read article >
( 4
min )
submitted by /u/OpenDILab
[link] [comments]
( 41
min )
submitted by /u/johnaldmilligan
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
Hello everyone. As i side project, I created a website that generated over 7,000 articles in one week, each with roughly 800 to 1000 words, all using the GPT 3.5 Turbo API in a fully automated manner. I created a Python script (also generated by the GPT) where I feed a list of topics, and it generates the content and automatically posts it on WordPress. In addition, I integrated the Google Images API to capture the image and also post it automatically. Currently, I can create around 10 posts per minute. And what about the cost? To generate these 7,000 posts with 7,000 images, I spent $40 so far!
So far, however, I don't know how Google or Bing will handle this AI-generated content and if it will affect SEO, but I'm here to check it out.
If you are interessed in how i did it and some videos, check my post: https://www.tigove.com/how/how-i-created-a-website-with-7000-post-with-chatgpt/
submitted by /u/maurimbr
[link] [comments]
( 42
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/MarkFulton
[link] [comments]
( 41
min )
submitted by /u/sidianmsjones
[link] [comments]
( 41
min )
https://medium.com/@wiroll/fake-news-chatbots-and-the-state-of-journalism-bf95c187e582
Basically...I (ChatGPT) wrote an op-ed with the essential hypothesis of, "let's double speeds in school zones in the name of safety" and...it got published...in a place I don't live...with no verification.
Problematic?
submitted by /u/KillBosby
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Csai
[link] [comments]
( 41
min )
submitted by /u/jaredigital62
[link] [comments]
( 48
min )
submitted by /u/DCGirl20874
[link] [comments]
( 41
min )
submitted by /u/theluk246
[link] [comments]
( 41
min )
Part 1: Understanding Zero-Shot Learning
( 12
min )
submitted by /u/MysteryInc152
[link] [comments]
( 46
min )
We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).
In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.
Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.
We are keen to hear your suggestions to improve the codebase further.
Github: https://github.com/PiotrNawrot/nanoT5
Twitter: https://twitter.com/p_nawrot/status/1636373725397520384
https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&s=68d413aa702b2160785a9f95e5cb00318fbfcdb4
submitted by /u/korec1234
[link] [comments]
( 44
min )
bloomz.cpp allows running inference of BLOOM-like models in pure C/C++ (inspired by llama.cpp). It supports all models that can be loaded with BloomForCausalLM.from_pretrained(). For example, you can achieve 16 tokens per second on a M1 Pro.
submitted by /u/hackerllama
[link] [comments]
( 43
min )
Hello! I read the following article about Microsoft laying off their AI Ethics team: https://www.cmswire.com/customer-experience/microsoft-cuts-ai-ethics-and-society-team-as-part-of-layoffs/
In your experience, what value do AI ethics teams add? Do they actually add useful insight, or do they serve more as a PR thing? I’ve heard conflicting anecdotes for each side. Is there anything you think AI ethics as a field can do to be more useful and to get more change? Thanks!
submitted by /u/namey-name-name
[link] [comments]
( 54
min )
An update is now available for NVIDIA Canvas, the free beta app that harnesses the power of AI to help artists quickly turn simple brushstrokes into realistic landscapes.
( 6
min )
Disney Dreamlight Valley is streaming from Steam and Epic Games Store on GeForce NOW starting today. It’s one of two new games this week that members can stream with beyond-fast performance using a GeForce NOW Ultimate membership. Game as if using a PC on any device — at up to 4K resolution and 120 frames Read article >
( 5
min )
Peter Ma was bored in his high school computer science class. So he decided to teach himself something new: how to use artificial intelligence to find alien life. That’s how he eventually became the lead author of a groundbreaking study published in Nature Astronomy. The study reveals how Ma and his co-authors used AI to Read article >
( 4
min )
submitted by /u/deeplearningperson
[link] [comments]
( 41
min )
This is a study on the potential widespread usage of alternative fuel
vehicles, linking them with the socio-economic status of the respective
consumers as well as the impact on the resulting air quality index. Research in
this area aims to leverage machine learning techniques in order to promote
appropriate policies for the proliferation of alternative fuel vehicles such as
electric vehicles with due justice to different population groups. Pearson
correlation coefficient is deployed in the modeling the relationships between
socio-economic data, air quality index and data on alternative fuel vehicles.
Linear regression is used to conduct predictive modeling on air quality index
as per the adoption of alternative fuel vehicles, based on socio-economic
factors. This work exemplifies artificial intelligence for social good.
( 2
min )
Moir\'e engineering in atomically thin van der Waals heterostructures creates
artificial quantum materials with designer properties. We solve the many-body
problem of interacting electrons confined to a moir\'e superlattice potential
minimum (the moir\'e atom) using a 2D fermionic neural network. We show that
strong Coulomb interactions in combination with the anisotropic moir\'e
potential lead to striking ``Wigner molecule" charge density distributions
observable with scanning tunneling microscopy.
( 2
min )
Diffusion models have become a popular approach for image generation and
reconstruction due to their numerous advantages. However, most diffusion-based
inverse problem-solving methods only deal with 2D images, and even recently
published 3D methods do not fully exploit the 3D distribution prior. To address
this, we propose a novel approach using two perpendicular pre-trained 2D
diffusion models to solve the 3D inverse problem. By modeling the 3D data
distribution as a product of 2D distributions sliced in different directions,
our method effectively addresses the curse of dimensionality. Our experimental
results demonstrate that our method is highly effective for 3D medical image
reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing
MRI, and sparse-view CT. Our method can generate high-quality voxel volumes
suitable for medical applications.
( 2
min )
Artwork recommendation is challenging because it requires understanding how
users interact with highly subjective content, the complexity of the concepts
embedded within the artwork, and the emotional and cognitive reflections they
may trigger in users. In this paper, we focus on efficiently capturing the
elements (i.e., latent semantic relationships) of visual art for personalized
recommendation. We propose and study recommender systems based on textual and
visual feature learning techniques, as well as their combinations. We then
perform a small-scale and a large-scale user-centric evaluation of the quality
of the recommendations. Our results indicate that textual features compare
favourably with visual ones, whereas a fusion of both captures the most
suitable hidden semantic relationships for artwork recommendation. Ultimately,
this paper contributes to our understanding of how to deliver content that
suitably matches the user's interests and how they are perceived.
( 2
min )
Adversarial training (AT) methods have been found to be effective against
adversarial attacks on deep neural networks. Many variants of AT have been
proposed to improve its performance. Pang et al. [1] have recently shown that
incorporating hypersphere embedding (HE) into the existing AT procedures
enhances robustness. We observe that the existing AT procedures are not
designed for the HE framework, and thus fail to adequately learn the angular
discriminative information available in the HE framework. In this paper, we
propose integrating HE into AT with regularization terms that exploit the rich
angular information available in the HE framework. Specifically, our method,
termed angular-AT, adds regularization terms to AT that explicitly enforce
weight-feature compactness and inter-class separation; all expressed in terms
of angular features. Experimental results show that angular-AT further improves
adversarial robustness.
( 2
min )
The performance of fault diagnosis systems is highly affected by data quality
in cyber-physical power systems. These systems generate massive amounts of data
that overburden the system with excessive computational costs. Another issue is
the presence of noise in recorded measurements, which prevents building a
precise decision model. Furthermore, the diagnostic model is often provided
with a mixture of redundant measurements that may deviate it from learning
normal and fault distributions. This paper presents the effect of feature
engineering on mitigating the aforementioned challenges in cyber-physical
systems. Feature selection and dimensionality reduction methods are combined
with decision models to simulate data-driven fault diagnosis in a 118-bus power
system. A comparative study is enabled accordingly to compare several advanced
techniques in both domains. Dimensionality reduction and feature selection
methods are compared both jointly and separately. Finally, experiments are
concluded, and a setting is suggested that enhances data quality for fault
diagnosis.
( 2
min )
The outbreak of the COVID-19 pandemic revealed the criticality of timely
intervention in a situation exacerbated by a shortage in medical staff and
equipment. Pain-level screening is the initial step toward identifying the
severity of patient conditions. Automatic recognition of state and feelings
help in identifying patient symptoms to take immediate adequate action and
providing a patient-centric medical plan tailored to a patient's state. In this
paper, we propose a framework for pain-level detection for deployment in the
United Arab Emirates and assess its performance using the most used approaches
in the literature. Our results show that a deployment of a pain-level deep
learning detection framework is promising in identifying the pain level
accurately.
( 2
min )
Several approximate inference methods have been proposed for deep discrete
latent variable models. However, non-parametric methods which have previously
been successfully employed for classical sparse coding models have largely been
unexplored in the context of deep models. We propose a non-parametric iterative
algorithm for learning discrete latent representations in such deep models.
Additionally, to learn scale invariant discrete features, we propose local data
scaling variables. Lastly, to encourage sparsity in our representations, we
propose a Beta-Bernoulli process prior on the latent factors. We evaluate our
spare coding model coupled with different likelihood models. We evaluate our
method across datasets with varying characteristics and compare our results to
current amortized approximate inference methods.
( 2
min )
Hall effect thrusters are one of the most versatile and popular electric
propulsion systems for space use. Industry trends towards interplanetary
missions arise advances in design development of such propulsion systems. It is
understood that correct sizing of discharge channel in Hall effect thruster
impact performance greatly. Since the complete physics model of such propulsion
system is not yet optimized for fast computations and design iterations, most
thrusters are being designed using so-called scaling laws. But this work
focuses on rather novel approach, which is outlined less frequently than
ordinary scaling design approach in literature. Using deep machine learning it
is possible to create predictive performance model, which can be used to
effortlessly get design of required hall thruster with required characteristics
using way less computational power than design from scratch and way more
flexible than usual scaling approach.
( 2
min )
Our research deals with the optimization version of the set partition
problem, where the objective is to minimize the absolute difference between the
sums of the two disjoint partitions. Although this problem is known to be
NP-hard and requires exponential time to solve, we propose a less demanding
version of this problem where the goal is to find a locally optimal solution.
In our approach, we consider the local optimality in respect to any movement of
at most two elements. To accomplish this, we developed an algorithm that can
generate a locally optimal solution in at most $O(N^2)$ time and $O(N)$ space.
Our algorithm can handle arbitrary input precisions and does not require
positive or integer inputs. Hence, it can be applied in various problem
scenarios with ease.
( 2
min )
who's applying and what are you planning to build??? https://www.axios.com/2023/03/15/mozilla-responsible-ai-challenge
submitted by /u/joodfish
[link] [comments]
( 43
min )
Here are the samples. My favourite is this one! Which one is your favourite?
These samples are the product of a transformer (encoder) model trained on only 3 hours of music. Each sample is seeded by the first four bars of a real piece of music. These are the final samples before I completely overhaul the pre-training stage. The idea is to go from about 2-hours of midi to over 500 hours. I'm very excited to see how this effects the sample quality.
If anyone in interesting in following the project. Star the GitHub and follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
Baidu will unveil its conversational AI ERNIE Bot, powered by Baidu's in-house LLMs, on March 16. The ERNIE LLM was first proposed as a language understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters.
ERNIE 1.0: https://arxiv.org/abs/1904.09223
ERNIE 2.0: https://arxiv.org/abs/1907.12412
ERNIE 3.0: https://arxiv.org/abs/2112.12731
ERNIE for text-to-image: https://arxiv.org/abs/2210.15257
ERNIE Bot live-stream on YouTube: https://www.youtube.com/watch?v=ukvEUI3x0vI
submitted by /u/kizumada
[link] [comments]
( 43
min )
submitted by /u/Hytsol
[link] [comments]
( 41
min )
submitted by /u/JaviFesser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Prunestand
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Salt-Entertainer3777
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/npsedhain
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/vjmde
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/jkterry1
[link] [comments]
( 41
min )
Hello everyone,
I'd like to show you a "working AlphaZero implementation that's simple enough to be able to understand what's going on at a quick glance, without sacrificing too much."
Link: https://github.com/scascin0/alphazero
submitted by /u/ayan0k0ji
[link] [comments]
( 41
min )
Global leader in convenient foods and beverages PepsiCo is deploying advanced machine vision technology from startup KoiReader Technologies, powered by the NVIDIA AI platform and GPUs, to improve efficiency and accuracy in its distribution process. PepsiCo has identified KoiReader’s technology as a solution to enable greater efficiency in reading warehouse labels. This AI-powered innovation helps Read article >
( 5
min )
It all started with two software engineers and a tomato farmer on a West Coast road trip. Visiting farms to survey their needs, the three hatched a plan at an apple orchard: build a highly adaptable 3D vision AI system for automating field tasks. Verdant, based in the San Francisco Bay Area, is developing AI Read article >
( 7
min )
Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of […]
( 11
min )
Hey r/MachineLearning,
We are collecting a hand-crafted curated list of awesome curated lists closely related to machine learning.
Here is the link to the Github repo: https://github.com/zhimin-z/awesome-awesome-machine-learning
Do any lists need to be included from your perspective? Please let me know, or feel free to submit a pull request.
The motivation underlying this project is that so many awesome lists regarding machine learning exist on GitHub. But, gradually, it adds a mental burden to memorize where to look for when the ML world is progressing faster and faster these days.
Thus, there the project comes, as a unification to sew together all awesome lists closely related to machine learning.
submitted by /u/happybirdie007
[link] [comments]
( 43
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/psprady
[link] [comments]
( 41
min )
submitted by /u/hottown
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
submitted by /u/Farnectarine4825
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Dalembert
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
Learn how to create mind-blowing AI art with just a few keywords! This guide will show you how to use an AI model to generate stunning digital art, step by step!
https://youtu.be/HmrqjqyxeCo
submitted by /u/TheQuestionStation
[link] [comments]
( 41
min )
submitted by /u/Repeat-or
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/messyp
[link] [comments]
( 42
min )
Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a […]
( 9
min )
When I was getting my MBA at the University of Iowa in 1981, my advisor Gary Fethke (who would later serve as University of Iowa interim president and Emeritus Professor in Business Analytics) convinced me to take a PhD class in econometrics. I think he was trying to punish me or something. I was totally… Read More »Future of Education: Application not Regurgitation of Knowledge – Part I
The post Future of Education: Application not Regurgitation of Knowledge – Part I appeared first on Data Science Central.
( 23
min )
As a part of my teaching for AI at the University of Oxford, I read a large number of books which are based on the maths of data science. Data Science and Machine Learning Mathematical and Statistical Methods is a book i recommend if you like the maths of data science. There is a pdf… Read More »Data Science and Machine Learning Mathematical and Statistical Methods
The post Data Science and Machine Learning Mathematical and Statistical Methods appeared first on Data Science Central.
( 20
min )
Announcements Our Revamped Submission Guidelines Since our migration to WordPress, we have been looking to solidify a set of guidelines for writers to look at prior to submitting that will give them a rough idea of the quality standards the editors are looking for. Many of you will be familiar with our Tips and Tricks… Read More »DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines
The post DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines appeared first on Data Science Central.
( 20
min )
Paper - https://arxiv.org/abs/2303.05398
submitted by /u/MysteryInc152
[link] [comments]
( 45
min )
submitted by /u/MasterBin-IIAU
[link] [comments]
( 45
min )
Researchers used machine learning to build faster and more efficient hash functions, which are a key component of databases.
( 10
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 42
min )
This post is co-written with Mahima Agarwal, Machine Learning Engineer, and Deepak Mettem, Senior Engineering Manager, at VMware Carbon Black VMware Carbon Black is a renowned security solution offering protection against the full spectrum of modern cyberattacks. With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) […]
( 11
min )
Amazon SageMaker Ground Truth Plus is a managed data labeling service that makes it easy to label data for machine learning (ML) applications. One common use case is semantic segmentation, which is a computer vision ML technique that involves assigning class labels to individual pixels in an image. For example, in video frames captured by […]
( 7
min )
(Image Source) Remote work has skyrocketed in the last three years. And with that comes increased productivity, happier employees, and lower overhead costs. But unfortunately, it’s not all sunshine and rainbows for companies with remote teams. Studies show that employees working from home increase the frequency of cyberattacks by 238%. And with the global average… Read More »How to Implement a Data Privacy and Protection Strategy for Remote Teams
The post How to Implement a Data Privacy and Protection Strategy for Remote Teams appeared first on Data Science Central.
( 23
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
With the development of hardware accelerators and their corresponding tools,
evaluations have become more affordable through fast and massively parallel
evaluations in some applications. This advancement has drastically sped up the
runtime of evolution-inspired algorithms such as Quality-Diversity
optimization, creating tremendous potential for algorithmic innovation through
scale. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD
algorithm based on Evolution Strategies (ES) designed for fast parallel
evaluations. ME-Multi-ES builds on top of the existing MAP-Elites-ES algorithm,
scaling it by maintaining multiple independent ES threads with massive
parallelization. We also introduce a new dynamic reset procedure for the
lifespan of the independent ES to autonomously maximize the improvement of the
QD population. We show experimentally that MEMES outperforms existing
gradient-based and objective-agnostic QD algorithms when compared in terms of
generations. We perform this comparison on both black-box optimization and
QD-Reinforcement Learning tasks, demonstrating the benefit of our approach
across different problems and domains. Finally, we also find that our approach
intrinsically enables optimization of fitness locally around a niche, a
phenomenon not observed in other QD algorithms.
( 2
min )
This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands
for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized,
method for real-parameter (continuous domain) optimization of non-linear,
non-convex functions. We try to motivate and derive the algorithm from
intuitive concepts and from requirements of non-linear, non-convex search in
continuous domain.
( 2
min )
The use of unlicensed spectrum for cellular systems to mitigate spectrum
scarcity has led to the development of intelligent adaptive approaches to
spectrum access that improve upon traditional carrier sensing and
listen-before-talk methods. We study decentralized contention-based medium
access for base stations (BSs) of a single Radio Access Technology (RAT)
operating on unlicensed shared spectrum. We devise a distributed deep
reinforcement learning-based algorithm for both contention and adaptive
modulation, modelled on a two state Markov decision process, that attempts to
maximize a network-wide downlink throughput objective. Empirically, we find the
(proportional fairness) reward accumulated by a policy gradient approach to be
significantly higher than even a genie-aided adaptive energy detection
threshold. Our approaches are further validated by improved sum and peak
throughput. The scalability of our approach to large networks is demonstrated
via an improved cumulative reward earned on both indoor and outdoor layouts
with a large number of BSs.
( 2
min )
It is common to utilise dynamic models to measure the tyre-road friction in
real-time. Alternatively, predictive approaches estimate the tyre-road friction
by identifying the environmental factors affecting it. This work aims to
formulate the problem of friction estimation as a visual perceptual learning
task. The problem is broken down into detecting surface characteristics by
applying semantic segmentation and using the extracted features to predict the
frictional force. This work for the first time formulates the friction
estimation problem as a regression from the latent space of a semantic
segmentation model. The preliminary results indicate that this approach can
estimate frictional force.
( 2
min )
In this case study we trained and published a state-of-the-art open-source
model for Automatic Speech Recognition (ASR) for German to evaluate the current
potential of this technology for the use in the larger context of Digital
Humanities and cultural heritage indexation. Along with this paper we publish
our wav2vec2 based speech to text model while we evaluate its performance on a
corpus of historical recordings we assembled compared against commercial
cloud-based and proprietary services. While our model achieves moderate
results, we see that proprietary cloud services fare significantly better. As
our results show, recognition rates over 90 percent can currently be achieved,
however, these numbers drop quickly once the recordings feature limited audio
quality or use of non-every day or outworn language. A big issue is the high
variety of different dialects and accents in the German language. Nevertheless,
this paper highlights that the currently available quality of recognition is
high enough to address various use cases in the Digital Humanities. We argue
that ASR will become a key technology for the documentation and analysis of
audio-visual sources and identify an array of important questions that the DH
community and cultural heritage stakeholders will have to address in the near
future.
( 2
min )
General robotic grippers are challenging to control because of their rich
nonsmooth contact dynamics and the many sources of uncertainties due to the
environment or sensor noise. In this work, we demonstrate how to compute 6-DoF
grasp poses using simulation-based Bayesian inference through the full
stochastic forward simulation of the robot in its environment while robustly
accounting for many of the uncertainties in the system. A Riemannian manifold
optimization procedure preserving the nonlinearity of the rotation space is
used to compute the maximum a posteriori grasp pose. Simulation and physical
benchmarks show the promising high success rate of the approach.
( 2
min )
When dealing with electro or magnetoencephalography records, many supervised
prediction tasks are solved by working with covariance matrices to summarize
the signals. Learning with these matrices requires using Riemanian geometry to
account for their structure. In this paper, we propose a new method to deal
with distributions of covariance matrices and demonstrate its computational
efficiency on M/EEG multivariate time series. More specifically, we define a
Sliced-Wasserstein distance between measures of symmetric positive definite
matrices that comes with strong theoretical guarantees. Then, we take advantage
of its properties and kernel methods to apply this distance to brain-age
prediction from MEG data and compare it to state-of-the-art algorithms based on
Riemannian geometry. Finally, we show that it is an efficient surrogate to the
Wasserstein distance in domain adaptation for Brain Computer Interface
applications.
( 2
min )
An efficient deep learning model that can be implemented in real-time for
polyp detection is crucial to reducing polyp miss-rate during screening
procedures. Convolutional neural networks (CNNs) are vulnerable to small
changes in the input image. A CNN-based model may miss the same polyp appearing
in a series of consecutive frames and produce unsubtle detection output due to
changes in camera pose, lighting condition, light reflection, etc. In this
study, we attempt to tackle this problem by integrating temporal information
among neighboring frames. We propose an efficient feature concatenation method
for a CNN-based encoder-decoder model without adding complexity to the model.
The proposed method incorporates extracted feature maps of previous frames to
detect polyps in the current frame. The experimental results demonstrate that
the proposed method of feature concatenation improves the overall performance
of automatic polyp detection in videos. The following results are obtained on a
public video dataset: sensitivity 90.94\%, precision 90.53\%, and specificity
92.46%
( 2
min )
Accuracy validation of cortical thickness measurement is a difficult problem
due to the lack of ground truth data. To address this need, many methods have
been developed to synthetically induce gray matter (GM) atrophy in an MRI via
deformable registration, creating a set of images with known changes in
cortical thickness. However, these methods often cause blurring in atrophied
regions, and cannot simulate realistic atrophy within deep sulci where
cerebrospinal fluid (CSF) is obscured or absent. In this paper, we present a
solution using a self-supervised inpainting model to generate CSF in these
regions and create images with more plausible GM/CSF boundaries. Specifically,
we introduce a novel, 3D GAN model that incorporates patch-based dropout
training, edge map priors, and sinusoidal positional encoding, all of which are
established methods previously limited to 2D domains. We show that our
framework significantly improves the quality of the resulting synthetic images
and is adaptable to unseen data with fine-tuning. We also demonstrate that our
resulting dataset can be employed for accuracy validation of cortical
segmentation and thickness measurement.
( 2
min )
We provide an example of a distribution preserving source separation method,
which aims at addressing perceptual shortcomings of state-of-the-art methods.
Our approach uses unconditioned generative models of signal sources.
Reconstruction is achieved by means of mix-consistent sampling from a
distribution conditioned on a realization of a mix. The separated signals
follow their respective source distributions, which provides an advantage when
separation results are evaluated in a listening test.
( 2
min )
3D human mesh recovery from a 2D pose plays an important role in various
applications. However, it is hard for existing methods to simultaneously
capture the multiple relations during the evolution from skeleton to mesh,
including joint-joint, joint-vertex and vertex-vertex relations, which often
leads to implausible results. To address this issue, we propose a novel
solution, called GATOR, that contains an encoder of Graph-Aware Transformer
(GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these
multiple relations. Specifically, GAT combines a GCN and a graph-aware
self-attention in parallel to capture physical and hidden joint-joint
relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions
to explore joint and vertex relations. Based on the clustering characteristics
of vertex offset fields, MDR regresses the vertices by composing the predicted
base motions. Extensive experiments show that GATOR achieves state-of-the-art
performance on two challenging benchmarks.
( 2
min )
Modelling dynamical systems is an integral component for understanding the
natural world. To this end, neural networks are becoming an increasingly
popular candidate owing to their ability to learn complex functions from large
amounts of data. Despite this recent progress, there has not been an adequate
discussion on the architectural regularization that neural networks offer when
learning such systems, hindering their efficient usage. In this paper, we
initiate a discussion in this direction using coordinate networks as a test
bed. We interpret dynamical systems and coordinate networks from a signal
processing lens, and show that simple coordinate networks with few layers can
be used to solve multiple problems in modelling dynamical systems, without any
explicit regularizers.
( 2
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based
on keypoints. Keypoint-based grasp detector from image input has demonstrated
promising results in the previous study, where the additional visual
information provided by color images compensates for the noisy depth
perception. However, it relies heavily on accurately predicting the location of
keypoints in the image space. In this paper, we devise a new grasp generation
network that reduces the dependency on precise keypoint estimation. Given an
RGB-D input, our network estimates both the grasp pose from keypoint detection
as well as scale towards the camera. We further re-design the keypoint output
space in order to mitigate the negative impact of keypoint prediction noise to
Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method
outperforms the baseline by a large margin, validating the efficacy of our
approach. Finally, despite trained on simple synthetic objects, our method
demonstrate sim-to-real capacity by showing competitive results in real-world
robot experiments.
( 2
min )
Despite the impressive performance of vision-based pose estimators, they
generally fail to perform well under adverse vision conditions and often don't
satisfy the privacy demands of customers. As a result, researchers have begun
to study tactile sensing systems as an alternative. However, these systems
suffer from noisy and ambiguous recordings. To tackle this problem, we propose
a novel solution for pose estimation from ambiguous pressure data. Our method
comprises a spatio-temporal vision transformer with an encoder-decoder
architecture. Detailed experiments on two popular public datasets reveal that
our model outperforms existing solutions in the area. Moreover, we observe that
increasing the number of temporal crops in the early stages of the network
positively impacts the performance while pre-training the network in a
self-supervised setting using a masked auto-encoder approach also further
improves the results.
( 2
min )
Rainfall data collected by various remote sensing instruments such as radars
or satellites has different space-time resolutions. This study aims to improve
the temporal resolution of radar rainfall products to help with more accurate
climate change modeling and studies. In this direction, we introduce a solution
based on EfficientNetV2, namely EfficientTempNet, to increase the temporal
resolution of radar-based rainfall products from 10 minutes to 5 minutes. We
tested EfficientRainNet over a dataset for the state of Iowa, US, and compared
its performance to three different baselines to show that EfficientTempNet
presents a viable option for better climate change monitoring.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
Automatic Speech Recognition (ASR) in medical contexts has the potential to
save time, cut costs, increase report accuracy, and reduce physician burnout.
However, the healthcare industry has been slower to adopt this technology, in
part due to the importance of avoiding medically-relevant transcription
mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR
metric that penalizes clinically-relevant mistakes more than others. We
demonstrate that this metric more closely aligns with clinician preferences on
medical sentences as compared to other metrics (WER, BLUE, METEOR, etc),
sometimes by wide margins. We collect a benchmark of 13 clinician preferences
on 149 realistic medical sentences called the Clinician Transcript Preference
benchmark (CTP), demonstrate that CBERTScore more closely matches what
clinicians prefer, and release the benchmark for the community to further
develop clinically-aware ASR metrics.
( 2
min )
Classical multidimensional scaling (CMDS) is a technique that aims to embed a
set of objects in a Euclidean space given their pairwise Euclidean distance
matrix. The main part of CMDS is based on double centering a squared distance
matrix and employing a truncated eigendecomposition to recover the point
coordinates. A central result in CMDS connects the squared Euclidean matrix to
a Gram matrix derived from the set of points. In this paper, we study a dual
basis approach to classical multidimensional scaling. We give an explicit
formula for the dual basis and fully characterize the spectrum of an essential
matrix in the dual basis framework. We make connections to a related problem in
metric nearness.
( 2
min )
Unfolding networks have shown promising results in the Compressed Sensing
(CS) field. Yet, the investigation of their generalization ability is still in
its infancy. In this paper, we perform generalization analysis of a
state-of-the-art ADMM-based unfolding network, which jointly learns a decoder
for CS and a sparsifying redundant analysis operator. To this end, we first
impose a structural constraint on the learnable sparsifier, which parametrizes
the network's hypothesis class. For the latter, we estimate its Rademacher
complexity. With this estimate in hand, we deliver generalization error bounds
for the examined network. Finally, the validity of our theory is assessed and
numerical comparisons to a state-of-the-art unfolding network are made, on
synthetic and real-world datasets. Our experimental results demonstrate that
our proposed framework complies with our theoretical findings and outperforms
the baseline, consistently for all datasets.
( 2
min )
In recent years, knowledge distillation has become a cornerstone of
efficiently deployed machine learning, with labs and industries using knowledge
distillation to train models that are inexpensive and resource-optimized.
Trojan attacks have contemporaneously gained significant prominence, revealing
fundamental vulnerabilities in deep learning models. Given the widespread use
of knowledge distillation, in this work we seek to exploit the unlabelled data
knowledge distillation process to embed Trojans in a student model without
introducing conspicuous behavior in the teacher. We ultimately devise a Trojan
attack that effectively reduces student accuracy, does not alter teacher
performance, and is efficiently constructible in practice.
( 2
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
submitted by /u/actmademewannakms
[link] [comments]
( 43
min )
submitted by /u/Amazing_Painter_7692
[link] [comments]
( 44
min )
submitted by /u/fchung
[link] [comments]
( 46
min )
I put together this plain pytorch implementation of LLaMA (i just substituted the fairscale layers with the native ones and converted the weights accordingly) that can be more easily run in different environments.
The big problem with the official implementation is that in order to run the 65B version you need 8 GPUs no matter what, and to run the 30B version you need 4 and so on. In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.
vanilla-llama solves this problem. You just need to have enough memory and the model will be load in all the available GPUs.
https://github.com/galatolofederico/vanilla-llama
submitted by /u/poppear
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oreosqueen6
[link] [comments]
( 41
min )
submitted by /u/Illustrious-Sign3015
[link] [comments]
( 41
min )
we take a closer look at Aicolumns - an online platform dedicated to artificial intelligence. Discover the latest AI tools, trends, and insights from a team of expert writers. Whether you're a seasoned AI professional or just starting out, aicolumns.com is your ultimate guide to all things AI.
https://youtu.be/927XESjV3kg
submitted by /u/Bassissou23
[link] [comments]
( 41
min )
submitted by /u/Wireless_Life
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/tottocotunio
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/MusabShakeel
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/XiaolongWang
[link] [comments]
( 43
min )
https://github.com/jacobgil/confidenceinterval
pip install confidenceinterval
tldr: You don't have an excuse anymore to not use confidence intervals !
In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.
For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.
More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.
…
( 45
min )
submitted by /u/Simusid
[link] [comments]
( 50
min )
submitted by /u/Soft-Material3294
[link] [comments]
( 43
min )
submitted by /u/madredditscientist
[link] [comments]
( 45
min )
Decompose Python libraries and generate Coherent hierarchical topic models of the repository.
https://github.com/danielpatrickhug/GitModel
The ability to bootstrap its own codebase is a powerful feature as it allows for efficient self-improvement and expansion. It means that the codebase is designed in such a way that it can use its own output as an input to improve itself. In the context of GitModel, this feature allows for the efficient improvement and expansion of its own codebase. By using its own output to generate hierarchical topic trees of GitHub repositories, it can analyze and extract insights from its own codebase and other codebases to improve its functionality. This can lead to more efficient and effective code generation, better semantic graph generation, and improved text generation capabilities.
I spent around 10 hours today on a major refactor creating a simple pipeline abstraction and allowing dynamic instantiation from yaml configs. It now also supports multiple GNN heads.
Please try it out and let me know what you think!
Example:
https://github.com/deepmind/clrs
https://preview.redd.it/ut4fc6c401na1.png?width=1506&format=png&auto=webp&s=d757356424b933cfa039cd922e27ec85bdffe0d4
submitted by /u/NovelspaceOnly
[link] [comments]
( 48
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/merino_london16
[link] [comments]
( 42
min )
Midjourney seems to consistently have the best results. Have had very mixed results with Stable Diffusion, Lexica, and others like OpenJourney.
What model is closest to Midjourney's results but is open source &/or has an API?
submitted by /u/sideprojects_ai
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/csansoon
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/LincolnOsiris_
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income
and engages in consumption at each time step while aiming to maximize a concave
utility subject to the underlying market conditions. The households aim to find
the optimal saving strategy that maximizes their discounted cumulative utility
given the market condition, while the firms determine the market conditions
through maximizing corporate profit based on the household population behavior.
The model captures a wide range of applications in macroeconomic studies, and
we propose a data-driven reinforcement learning framework that finds the
regularized competitive equilibrium of the model. The proposed algorithm enjoys
theoretical guarantees in converging to the equilibrium of the market at a
sub-linear rate.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
We introduce a class of networked Markov potential games where agents are
associated with nodes in a network. Each agent has its own local potential
function, and the reward of each agent depends only on the states and actions
of agents within a $\kappa$-hop neighborhood. In this context, we propose a
localized actor-critic algorithm. The algorithm is scalable since each agent
uses only local information and does not need access to the global state.
Further, the algorithm overcomes the curse of dimensionality through the use of
function approximation. Our main results provide finite-sample guarantees up to
a localization error and a function approximation error. Specifically, we
achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by
the averaged Nash regret. This is the first finite-sample bound for multi-agent
competitive games that does not depend on the number of agents.
( 2
min )
A rigorous formalization of desired system requirements is indispensable when
performing any verification task. This often limits the application of
verification techniques, as writing formal specifications is an error-prone and
time-consuming manual task. To facilitate this, we present nl2spec, a framework
for applying Large Language Models (LLMs) to derive formal specifications (in
temporal logics) from unstructured natural language. In particular, we
introduce a new methodology to detect and resolve the inherent ambiguity of
system requirements in natural language: we utilize LLMs to map subformulas of
the formalization back to the corresponding natural language fragments of the
input. Users iteratively add, delete, and edit these sub-translations to amend
erroneous formalizations, which is easier than manually redrafting the entire
formalization. The framework is agnostic to specific application domains and
can be extended to similar specification languages and new neural models. We
perform a user study to obtain a challenging dataset, which we use to run
experiments on the quality of translations. We provide an open-source
implementation, including a web-based frontend.
( 2
min )
Blackwell's approachability is a very general sequential decision framework
where a Decision Maker obtains vector-valued outcomes, and aims at the
convergence of the average outcome to a given "target" set. Blackwell gave a
sufficient condition for the decision maker having a strategy guaranteeing such
a convergence against an adversarial environment, as well as what we now call
the Blackwell's algorithm, which then ensures convergence. Blackwell's
approachability has since been applied to numerous problems, in online learning
and game theory, in particular. We extend this framework by allowing the
outcome function and the dot product to be time-dependent. We establish a
general guarantee for the natural extension to this framework of Blackwell's
algorithm. In the case where the target set is an orthant, we present a family
of time-dependent dot products which yields different convergence speeds for
each coordinate of the average outcome. We apply this framework to the Big
Match (one of the most important toy examples of stochastic games) where an
$\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's
algorithm in a well-chosen auxiliary approachability problem.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
submitted by /u/dharambir_iitk
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Parth-Prajapati
[link] [comments]
( 43
min )
submitted by /u/catalinghita8
[link] [comments]
( 42
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/joelwohlhauser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Samples can be found here and here. See how they compare to the original chorales and fugues.
The model uses a Transformer encoder architecture to complete partially corrupted sequences representations of music. A version of Gibbs sampling is then used to construct new music from scratch. The entire model was trained in under 30 minutes on a single Tesla V100 - really showcasing the efficiency of Transformers in general.
Note that the fugue samples are seeded by the first three bars of an actual Bach fugue. The chorales are generated completely from scratch!
For more information on how it works - see the GitHub repo or follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
submitted by /u/blabboy
[link] [comments]
( 43
min )
I recently delved into the world of transformers and their application to vision tasks.
As part of my learning process, I implemented the Vision Transformer (ViT) from scratch using PyTorch. I am sharing my implementation and a step-by-step guide to implementing the model in this post.
I hope you find it helpful.
Github: https://github.com/tintn/vision-transformer-from-scratch
Post: https://medium.com/towards-data-science/implementing-vision-transformer-vit-from-scratch-3e192c6155f0
submitted by /u/Tin_Ng
[link] [comments]
( 43
min )
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler enables you to access data from a wide variety of popular sources (Amazon S3, Amazon Athena, Amazon Redshift, Amazon EMR and Snowflake) and over 40 other third-party sources. […]
( 10
min )
In this two-part series, we demonstrate how to label and train models for 3D object detection tasks. In part 1, we discuss the dataset we’re using, as well as any preprocessing steps, to understand and label data. In part 2, we walk through how to train a model on your dataset and deploy it to […]
( 13
min )
Online fraud has a widespread impact on businesses and requires an effective end-to-end strategy to detect and prevent new account fraud and account takeovers, and stop suspicious payment transactions. In this post, we show a serverless approach to detect online transaction fraud in near-real time. We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review).
( 7
min )
Aleksander Mądry urges lawmakers to ask rigorous questions about how AI tools are being used by corporations.
( 8
min )
The computer science and philosophy double-major aims to advance the field of AI ethics.
( 9
min )
submitted by /u/Dendrophile_guy
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/_utisz_
[link] [comments]
( 41
min )
submitted by /u/A_single_french_fry
[link] [comments]
( 41
min )
submitted by /u/tomd_96
[link] [comments]
( 41
min )
submitted by /u/harttrav
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/jsonathan
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 46
min )
The AI landscape is being reshaped by the rise of generative models capable of synthesizing high-quality data, such as text, images, music, and videos. The course toward democratization of AI helped to further popularize generative AI following the open-source releases for such foundation model families as BERT, T5, GPT, CLIP and, most recently, Stable Diffusion. […]
( 9
min )
As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. This has led to the emergence of a data-centric approach to ML and various techniques to improve model performance by focusing on data requirements. Applying these techniques allows ML practitioners […]
( 9
min )
Aided by machine learning, scientists are working to develop a vaccine that would be effective against all SARS-Cov-2 strains.
( 10
min )
It’s a thrilling GFN Thursday with GRID Legends racing to the cloud this week. It leads a total of eight new games expanding the GeForce NOW library. New content for Rainbow Six Siege is also now streaming. Plus, two new cities are now online with GeForce RTX 4080 performance for cloud gaming. Chicago and Montreal Read article >
( 6
min )
Hi, I work at Intel as an academic outreach coordinator. I'm sharing about Intel's open source OpenVINO toolkit for optimizing and deploy AI inference on CPUs, discrete and integrated GPUs, and other accelerators like Movidius VPUs and Intel FPGA. The github has over 60 jupyter notebooks that can work on Intel PCs/laptop using Windows & Linux, or on Macs on MacOS including M1 processors.
Try out the stable diffusion Jupyter Notebook #225, or try out the vehicle recognition and detection Jupyter Notebook #218
Its easy to install in 9 simple steps on Windows with pip install, 8 steps on MacOS, and 7 steps on Linux.
submitted by /u/JayMBurris
[link] [comments]
( 43
min )
submitted by /u/israelavila
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
I had to do a couple of tries but I think overall the results are impressive. Here it is:
https://www.youtube.com/watch?v=LcrLopIoJeA&t=14s&ab_channel=Triviadetodo
submitted by /u/laburanta
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Huguini
[link] [comments]
( 41
min )
submitted by /u/h_xiao
[link] [comments]
( 41
min )
submitted by /u/Discovensco
[link] [comments]
( 43
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
MIT researchers uncover the structural properties and dynamics of deep classifiers, offering novel explanations for optimization, generalization, and approximation in deep networks.
( 8
min )
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. Sagemaker provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so […]
( 10
min )
This post is co-authored with Hernan Figueroa, Sr. Manager Data Science at Marubeni Power International. Marubeni Power International Inc (MPII) owns and invests in power business platforms in the Americas. An important vertical for MPII is asset management for renewable energy and energy storage assets, which are critical to reduce the carbon intensity of our […]
( 10
min )
Reinforcement learning (RL) encompasses a class of machine learning (ML) techniques that can be used to solve sequential decision-making problems. RL techniques have found widespread applications in numerous domains, including financial services, autonomous navigation, industrial control, and e-commerce. The objective of an RL problem is to train an agent that, given an observation from its […]
( 11
min )
Dear community,
I have written a medium article showing my top 5 resources that have made me learn DRL fast, from zero (I was previously a researcher on Bayesian optimization) to research in these topics in this Medium article https://medium.com/@eduardogarrido90/you-can-do-it-top-5-resources-to-easily-learn-deep-reinforcement-learning-d0bdef295cc6 hope that you like it.
Best,
submitted by /u/EduCGM
[link] [comments]
( 42
min )
Hi all,
I've been uploading blog posts to Medium summarizing the content from Reinforcement Learning, 2nd Edition, along with code examples.
I remember when I was first learning about RL I wish someone had done this, so I've decided to do it hoping it might help anyone who's just getting started. I've summarized up to chapter 4 thus far and the posts can be found here: https://medium.com/@numsmt2
I plan on going through the entire book. Hope this helps!
submitted by /u/Common-Mushroom2333
[link] [comments]
( 41
min )
submitted by /u/mrx-ai
[link] [comments]
( 42
min )
The authors fastdup ran an analysis on LAION 400M and Imagenet21K. Here's what they found.
Analysing LAION
LAION 400M - TLDR video.
60M duplicates.
962K broken images.
Various label discrepancies.
ImageNet21K - Link to blog post.
1.2M duplicate images.
104K train/val leak.
GitHub repo - https://github.com/visual-layer/fastdup
submitted by /u/WatercressTraining
[link] [comments]
( 44
min )
The Academy Award nominations are in — and for the 15th year in a row, NVIDIA technologies worked behind the scenes of every film nominated for Best Visual Effects. The five VFX contenders for the 95th annual Academy Awards, taking place on Sunday, March 12, include: All Quiet on the Western Front Avatar: The Way Read article >
( 7
min )
An adrenaline-fueled virtual ride in the sky is sure to satisfy all thrill seekers — courtesy of 3D artist Kosei Wano’s sensational animation, Moon Hawk. Wano outlines his creative workflow this week In the NVIDIA Studio.
( 7
min )
Preparing a retailer’s online catalog once required expensive physical photoshoots to capture products from every angle. A Tel Aviv startup is saving brands time and money by transforming these camera clicks into mouse clicks. Hexa uses GPU-accelerated computing to help companies turn their online inventory into 3D renders that shoppers can view in 360 degrees, Read article >
( 6
min )
submitted by /u/Impressive_Hat9961
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/sandropuppo
[link] [comments]
( 41
min )
submitted by /u/better__ideas
[link] [comments]
( 43
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 41
min )
submitted by /u/joerocca
[link] [comments]
( 43
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Announcements Repetitions of History: Can You Trust Your Eyes (or Ears)? We find ourselves in a similar conversation today that occurred in the 1880s when photography became widespread. Artists and critics derided photography because it lacked “that refined feeling and sentiment which animate the productions of a man of genius.” They believed photography lacked a… Read More »DSC Weekly 7 March 2023 – Repetitions of History: Can You Trust Your Eyes (or Ears)?
The post DSC Weekly 7 March 2023 – Repetitions of History: Can You Trust Your Eyes (or Ears)? appeared first on Data Science Central.
( 20
min )
Deploying models at scale can be a cumbersome task for many data scientists and machine learning engineers. However, Amazon SageMaker endpoints provide a simple solution for deploying and scaling your machine learning (ML) model inferences. Our last blog post and GitHub repo on hosting a YOLOv5 TensorFlowModel on Amazon SageMaker Endpoints sparked a lot of interest […]
( 7
min )
This post presents and compares options and recommended practices on how to manage Python packages and virtual environments in Amazon SageMaker Studio notebooks. A public GitHub repo provides hands-on examples for each of the presented approaches. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, […]
( 14
min )
The Amazon International Seller Growth (ISG) team runs the CSBA (Customer Service by Amazon) program that supports over 200,000 third-party Merchant Fulfilled Network (MFN) sellers. Amazon call centers facilitate hundreds of thousands of phone calls, chats, and emails going between the consumers and Amazon MFN sellers. The large volume of contacts creates a challenge for […]
( 10
min )
Yammer is a social networking platform designed for open and dynamic communications and collaborations within organizations. It allows you to build communities of interest, gather ideas and feedback, and keep everyone informed. It’s available via browser or mobile app, and provides a variety of common social networking features such as private and public communities, news […]
( 8
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
We consider a reinforcement learning setting in which the deployment
environment is different from the training environment. Applying a robust
Markov decision processes formulation, we extend the distributionally robust
$Q$-learning framework studied in Liu et al. [2022]. Further, we improve the
design and analysis of their multi-level Monte Carlo estimator. Assuming access
to a simulator, we prove that the worst-case expected sample complexity of our
algorithm to learn the optimal robust $Q$-function within an $\epsilon$ error
in the sup norm is upper bounded by $\tilde
O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where
$\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support
probability of the transition kernels and $\delta$ is the uncertainty size.
This is the first sample complexity result for the model-free robust RL
problem. Simulation studies further validate our theoretical results.
( 2
min )
Intent detection with semantically similar fine-grained intents is a
challenging task. To address it, we reformulate intent detection as a
question-answering retrieval task by treating utterances and intent names as
questions and answers. To that end, we utilize a question-answering retrieval
architecture and adopt a two stages training schema with batch contrastive
loss. In the pre-training stage, we improve query representations through
self-supervised training. Then, in the fine-tuning stage, we increase
contextualized token-level similarity scores between queries and answers from
the same intent. Our results on three few-shot intent detection benchmarks
achieve state-of-the-art performance.
( 2
min )
Recent studies indicate that deep learning plays a crucial role in the
automated visual inspection of road infrastructures. However, current learning
schemes are static, implying no dynamic adaptation to users' feedback. To
address this drawback, we present a few-shot learning paradigm for the
automated segmentation of road cracks, which is based on a U-Net architecture
with recurrent residual and attention modules (R2AU-Net). The retraining
strategy dynamically fine-tunes the weights of the U-Net as a few new rectified
samples are being fed into the classifier. Extensive experiments show that the
proposed few-shot R2AU-Net framework outperforms other state-of-the-art
networks in terms of Dice and IoU metrics, on a new dataset, named CrackMap,
which is made publicly available at https://github.com/ikatsamenis/CrackMap.
( 2
min )
This paper proposes a new GNN design strategy. This strategy relies on
Context-Free Grammars (CFG) generating the matrix language MATLANG. It enables
us to ensure both WL-expressive power, substructure counting abilities and
spectral properties. Applying our strategy, we design Grammatical Graph Neural
Network G$ ^2$N$^2$, a provably 3-WL GNN able to count at edge-level cycles of
length up to 6 and able to reach band-pass filters. A large number of
experiments covering these properties corroborate the presented theoretical
results.
( 2
min )
The problem of optimization on Stiefel manifold, i.e., minimizing functions
of (not necessarily square) matrices that satisfy orthogonality constraints,
has been extensively studied. Yet, a new approach is proposed based on, for the
first time, an interplay between thoughtfully designed continuous and discrete
dynamics. It leads to a gradient-based optimizer with intrinsically added
momentum. This method exactly preserves the manifold structure but does not
require additional operation to keep momentum in the changing (co)tangent
space, and thus has low computational cost and pleasant accuracy. Its
generalization to adaptive learning rates is also demonstrated. Notable
performances are observed in practical tasks. For instance, we found that
placing orthogonal constraints on attention heads of trained-from-scratch
Vision Transformer [Dosovitskiy et al. 2022] could markedly improve its
performance, when our optimizer is used, and it is better that each head is
made orthogonal within itself but not necessarily to other heads. This
optimizer also makes the useful notion of Projection Robust Wasserstein
Distance [Paty & Cuturi 2019; Lin et al. 2020] for high-dim. optimal transport
even more effective.
( 2
min )
We consider the problem of optimizing expensive black-box functions over
high-dimensional combinatorial spaces which arises in many science,
engineering, and ML applications. We use Bayesian Optimization (BO) and propose
a novel surrogate modeling approach for efficiently handling a large number of
binary and categorical parameters. The key idea is to select a number of
discrete structures from the input space (the dictionary) and use them to
define an ordinal embedding for high-dimensional combinatorial structures. This
allows us to use existing Gaussian process models for continuous spaces. We
develop a principled approach based on binary wavelets to construct
dictionaries for binary spaces, and propose a randomized construction method
that generalizes to categorical spaces. We provide theoretical justification to
support the effectiveness of the dictionary-based embeddings. Our experiments
on diverse real-world benchmarks demonstrate the effectiveness of our proposed
surrogate modeling approach over state-of-the-art BO methods.
( 2
min )
Motivated by a variety of applications, high-dimensional time series have
become an active topic of research. In particular, several methods and
finite-sample theories for individual stable autoregressive processes with
known lag have become available very recently. We, instead, consider multiple
stable autoregressive processes that share an unknown lag. We use information
across the different processes to simultaneously select the lag and estimate
the parameters. We prove that the estimated process is stable, and we establish
rates for the forecasting error that can outmatch the known rate in our
setting. Our insights on the lag selection and the stability are also of
interest for the case of individual autoregressive processes.
( 2
min )
Large-scale linear models are ubiquitous throughout machine learning, with
contemporary application as surrogate models for neural network uncertainty
quantification; that is, the linearised Laplace method. Alas, the computational
cost associated with Bayesian linear models constrains this method's
application to small networks, small output spaces and small datasets. We
address this limitation by introducing a scalable sample-based Bayesian
inference method for conjugate Gaussian multi-output linear models, together
with a matching method for hyperparameter (regularisation) selection.
Furthermore, we use a classic feature normalisation method (the g-prior) to
resolve a previously highlighted pathology of the linearised Laplace method.
Together, these contributions allow us to perform linearised neural network
inference with ResNet-18 on CIFAR100 (11M parameters, 100 output dimensions x
50k datapoints) and with a U-Net on a high-resolution tomographic
reconstruction task (2M parameters, 251k output dimensions).
( 2
min )
Hamiltonian mechanics is one of the cornerstones of natural sciences.
Recently there has been significant interest in learning Hamiltonian systems in
a free-form way directly from trajectory data. Previous methods have tackled
the problem of learning from many short, low-noise trajectories, but learning
from a small number of long, noisy trajectories, whilst accounting for model
uncertainty has not been addressed. In this work, we present a Gaussian process
model for Hamiltonian systems with efficient decoupled parameterisation, and
introduce an energy-conserving shooting method that allows robust inference
from both short and long trajectories. We demonstrate the method's success in
learning Hamiltonian systems in various data settings.
( 2
min )
The article considers semi-supervised multitask learning on a Gaussian
mixture model (GMM). Using methods from statistical physics, we compute the
asymptotic Bayes risk of each task in the regime of large datasets in high
dimension, from which we analyze the role of task similarity in learning and
evaluate the performance gain when tasks are learned together rather than
separately. In the supervised case, we derive a simple algorithm that attains
the Bayes optimal performance.
( 2
min )
In December last year, I've completed my MS in Data Science. My capstone project had to do with semantic segmentation of medical ultrasound images (TLDR: cancer detection). I used a transformer model based on SegFormer. After the project was completed, I tried to improve the model performance a bit more.
I was surprised by the IoU performance, which seemed a little too good to be true. I ended up writing my own metrics which calculated IoU, Dice, precision, and recall, among other things. My IoU results, computed with my own code, were consistently less than the IoU results I got from the library I was using at the time - the Evaluate library from Hugging Face. But their IoU was equal to what my code computed as recall (sensitivity). I've opened a ticket with Hugging Face:
https://github.com/huggingface/evaluate/issues/421
They basically said they had copied that whole code from OpenMMLab and I should take it up with them. So I did:
https://github.com/open-mmlab/mmsegmentation/issues/2655
That was more than a week ago and there's still no reply. Meanwhile I've seen other bug reports which appear to point at the same problem:
https://github.com/open-mmlab/mmsegmentation/issues/2594
I'm pretty sure I am right. The definition of IoU is quite simple, and there isn't much room there for interpretation. Their code fails simple test cases.
My concern is - since they effectively calculate recall instead of IoU, and recall is larger than, or equal to IoU, and since the MMSegmentation library is widely used in image segmentation research, it's possible there are quite a few results floating out there in the literature that are a few percentage points larger than what they should be - e.g. 90% IoU instead of 85%.
Thoughts?
submitted by /u/florinandrei
[link] [comments]
( 46
min )
Have anyone tried to optimize the forward and backward using custom Cuda code or fused kernel to speed up the training time of current LLMs? I only have seen FasterTransformer ( NVIDIA/FasterTransformer) and other similar tools but they're only focusing on inference.
submitted by /u/Pretend_Ad3180
[link] [comments]
( 43
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/JohnnyHercules
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/walt74
[link] [comments]
( 41
min )
submitted by /u/OnlyProggingForFun
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
Language models are statistical methods predicting the succession of tokens in sequences, using natural text. Large language models (LLMs) are neural network-based language models with hundreds of millions (BERT) to over a trillion parameters (MiCS), and whose size makes single-GPU training impractical. LLMs’ generative abilities make them popular for text synthesis, summarization, machine translation, and […]
( 18
min )
Back in 2018, I had the privilege of keynoting at one of Semantic Web Company’s events in Vienna, as well as attending the full event. It was a great opportunity to immerse myself in the Central European perspective on the utility of Linked Open Data standards and how those standards were being applied. I got… Read More »FAIR Content: Better Chatbot Answers and Content Reusability at Scale
The post FAIR Content: Better Chatbot Answers and Content Reusability at Scale appeared first on Data Science Central.
( 21
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/geogamersking
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/Less-Shirt5163
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/EIDANart
[link] [comments]
( 41
min )
submitted by /u/Wireless_Life
[link] [comments]
( 41
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 41
min )
submitted by /u/vfra32
[link] [comments]
( 43
min )
submitted by /u/sracluv
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/jsonathan
[link] [comments]
( 44
min )
submitted by /u/No_Bath_562
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 46
min )
Hi everyone. I have tested RWKV [loss vs token position] for 10000 ctx4k+ documents in Pile:
https://preview.redd.it/3ld2629h6xla1.png?width=941&format=png&auto=webp&s=008cb5eab35b86c3d9dc2378b1b78bdc98f50120
RWKV 1B5-4k is mostly flat after ctx1500, but 3B-4k and 7B-4k and 14B-4k have some slopes, and they are getting better. This debunks the old view that RNNs cannot model long ctxlens. These ctx4096 models are available at https://huggingface.co/BlinkDL.
We can predict that RWKV 100B will be great, and RWKV 1T is probably all you need :)
https://preview.redd.it/e3tbivtx6xla1.png?width=1174&format=png&auto=webp&s=53767f2e857edd429223472c0b67ef9ca31f2aa5
RWKV is simple. You can read https://arxiv.org/abs/2302.13939 (SpikeGPT) which is inspired by RWKV and has plenty of explanations. …
( 47
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
submitted by /u/TFW_YT
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 42
min )
https://github.com/danthelion/talksheet
A small project showcasing how to create a "self-serve" analytical application, powered by the wonderful Langchain and DuckDB.
There are a bunch of features (like supporting other file formats such as parquet and json) planned for the future, just wanted to ship something quickly.
submitted by /u/dan_the_lion
[link] [comments]
( 43
min )
submitted by /u/SpatialComputing
[link] [comments]
( 44
min )
submitted by /u/Pristine-Woodpecker
[link] [comments]
( 42
min )
submitted by /u/davidmezzetti
[link] [comments]
( 42
min )
submitted by /u/rumovoice
[link] [comments]
( 45
min )
submitted by /u/MysteryInc152
[link] [comments]
( 44
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/AnakinRagnarsson66
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/sediba-edud-eht
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 41
min )
submitted by /u/PM_ME_YOUR_REQUESTS
[link] [comments]
( 44
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
I have seen a lot of videos like this one which consist on Biden, Obama and Trump gaming together while they roast each other. Do you have any idea of what tool is used for this?
Thank you 🙏🏼
submitted by /u/ElonJuniorMusk
[link] [comments]
( 41
min )
submitted by /u/9999Karma
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 42
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Hi,
I was wondering, if there is an AI which can
Create slides from Images
Example: Screenshot from slide as input
I am not looking for sth like https://www.beautiful.ai/ , rather to create the elements in Google Slides which I could the arrange.
Thank you!
Example image
https://preview.redd.it/ca698mhgvqla1.png?width=1280&format=png&auto=webp&s=03a19d1df97855cb2d260f09b441a6fa8327a9ca
submitted by /u/rubicscube11
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 42
min )
submitted by /u/treyratcliff
[link] [comments]
( 41
min )
submitted by /u/Free_Yam_3287
[link] [comments]
( 41
min )
https://www.notabot.tech/subscribe?ref=iBUStIpICm
An AI newsletter made by Haroon Choudery. Keeps me up to date on all the juicy AI news! 🤖
Post Your Opinions!
submitted by /u/Muatangz
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/tomd_96
[link] [comments]
( 41
min )
submitted by /u/nick313
[link] [comments]
( 41
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/TheInsaneApp
[link] [comments]
( 41
min )
submitted by /u/DenofBlerds
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
There is a field of modelling called "Survival Analysis" (https://en.wikipedia.org/wiki/Survival_analysis), in which the objective is to model the effect of different "characteristics" (e.g. medical measurements of patients such as height, age, weight, etc.) on the "time of some event" (e.g. death). Many models used in Survival Analysis are essentially a form of "Regression Models" (https://en.wikipedia.org/wiki/Regression_analysis) - and of course, these models are built, train and fine tuned using some Optimization Algorithm (e.g. Newton-Raphson).
One of the most popular types of models used in Survival Analysis is called the "Cox Proportional-Hazards" Model (https://en.wikipedia.org/wiki/Proportional_hazards_model). As an example, here I have fit a Cox-PH Model to some dataset using th…
( 47
min )
All throughout the world, industrial processes are being increasingly redefined by IoT and AI. Smart energy grids, predictive maintenance sensors, and wearable gadgets like smartwatches and AR/VR goggles—IoT and AI have combined to unleash the potential of data quicker than ever. No sector of the economy is exempt from the advantages that IoT and AI… Read More »Power of AI Automation In Agritech: Everything You Need To Know For Your Business
The post Power of AI Automation In Agritech: Everything You Need To Know For Your Business appeared first on Data Science Central.
( 20
min )
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]
( 7
min )
Machine learning (ML) can help companies make better business decisions through advanced analytics. Companies across industries apply ML to use cases such as predicting customer churn, demand forecasting, credit scoring, predicting late shipments, and improving manufacturing quality. In this blog post, we’ll look at how Amazon SageMaker Canvas delivers faster and more accurate model training times enabling […]
( 5
min )
MIT researchers trained logic-aware language models to reduce harmful stereotypes like gender and racial biases.
( 8
min )
The long-running programming competition encourages skills and friendships that last a lifetime.
( 11
min )
Here is our podcast episode with Sergey Levine from UC Berkeley where we discussed the evolution of deep reinforcement learning, how previous robotics approaches were replaced, and why offline RL is significant for future generalization.
submitted by /u/thejashGI
[link] [comments]
( 43
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
The race toward sentient AI is on. A combination of hubris and competition between governments and societies akin to an arms race virtually ensures ‘sentient’ AI/AGI/ASI will be developed in relatively short order. There is increasing evidence such as the Othello Paper that is upending the auto-complete narrative already. LLMs having a world model implies theory of mind, and thus at least Functional Consciousness (albeit quantized for the time being) which likely in turn confers some form of partial non-anthropomorphic sentience, which will at some point open an ethical, societal, and religious Pandora’s box (see the Bodhisattva vow). The only thing we don’t know is just how far down this slippery slope we are at the moment. It’s also hard to argue against the runaway AI effect as well in …
( 43
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/TraxDarkstorm
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/pospielov
[link] [comments]
( 41
min )
submitted by /u/dreamfi_617
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/cbsudux
[link] [comments]
( 41
min )
submitted by /u/dpierce94
[link] [comments]
( 41
min )
submitted by /u/timCrooks
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
Financial market participants are faced with an overload of information that influences their decisions, and sentiment analysis stands out as a useful tool to help separate out the relevant and meaningful facts and figures. However, the same piece of news can have a positive or negative impact on stock prices, which presents a challenge for […]
( 14
min )
Amazon Kendra is an easy-to-use intelligent search service that allows you to integrate search capabilities with your applications so users can find information stored across data sources like Amazon Simple Storage Service , OneDrive and Google Drive; applications such as SalesForce, SharePoint and Service Now; and relational databases like Amazon Relational Database Service (Amazon RDS). Using […]
( 9
min )
March is already here and a new month always means new games, with a total of 19 joining the GeForce NOW library. Set off on a magical journey to restore Disney magic when Disney Dreamlight Valley joins the cloud later this month. Plus, the hunt is on with Capcom’s Monster Hunter Rise now available for Read article >
( 6
min )
These days, everyone is excited about Metaverse. The hype that Metaverse created over the few years is exceptional. Metaverse will give a whole new gaming experience to its users. In Metaverse, an immersive virtual world is created, in which users can play in a real-world setting with special effects with the help of VR and… Read More »Metaverse in Gaming: Revolution In Gaming industry With Next-Generation Experience
The post Metaverse in Gaming: Revolution In Gaming industry With Next-Generation Experience appeared first on Data Science Central.
( 23
min )
A process that seeks feedback from human specialists proves more effective at optimization than automated systems working alone.
( 9
min )
Appendicitis is among the most frequent reasons for pediatric abdominal
surgeries. With recent advances in machine learning, data-driven decision
support could help clinicians diagnose and manage patients while reducing the
number of non-critical surgeries. Previous decision support systems for
appendicitis focused on clinical, laboratory, scoring and computed tomography
data, mainly ignoring abdominal ultrasound, a noninvasive and readily available
diagnostic modality. To this end, we developed and validated interpretable
machine learning models for predicting the diagnosis, management and severity
of suspected appendicitis using ultrasound images. Our models were trained on a
dataset comprising 579 pediatric patients with 1709 ultrasound images
accompanied by clinical and laboratory data. Our methodological contribution is
the generalization of concept bottleneck models to prediction problems with
multiple views and incomplete concept sets. Notably, such models lend
themselves to interpretation and interaction via high-level concepts
understandable to clinicians without sacrificing performance or requiring
time-consuming image annotation when deployed.
( 2
min )
In computer vision, it is often observed that formulating regression problems
as a classification task often yields better performance. We investigate this
curious phenomenon and provide a derivation to show that classification, with
the cross-entropy loss, outperforms regression with a mean squared error loss
in its ability to learn high-entropy feature representations. Based on the
analysis, we propose an ordinal entropy loss to encourage higher-entropy
feature spaces while maintaining ordinal relationships to improve the
performance of regression tasks. Experiments on synthetic and real-world
regression tasks demonstrate the importance and benefits of increasing entropy
for regression.
( 2
min )
We propose a new high-performance activation function, Moderate Adaptive
Linear Units (MoLU), for the deep neural network. The MoLU is a simple,
beautiful and powerful activation function that can be a good main activation
function among hundreds of activation functions. Because the MoLU is made up of
the elementary functions, not only it is a infinite diffeomorphism (i.e. smooth
and infinitely differentiable over whole domains), but also it decreases
training time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy. To this end, we propose a forward process based on a
Brownian bridge and show that such a process leads to a reduction of the
mismatch compared to previous diffusion processes. More importantly, we show
that our approach improves in objective metrics over the baseline process with
only half of the iteration steps and having one hyperparameter less to tune.
( 2
min )
Adversarial training is a standard technique for training adversarially
robust models. In this paper, we study adversarial training as an alternating
best-response strategy in a 2-player zero-sum game. We prove that even in a
simple scenario of a linear classifier and a statistical model that abstracts
robust vs. non-robust features, the alternating best response strategy of such
game may not converge. On the other hand, a unique pure Nash equilibrium of the
game exists and is provably robust. We support our theoretical results with
experiments, showing the non-convergence of adversarial training and the
robustness of Nash equilibrium.
( 2
min )
In reinforcement learning for safety-critical settings, it is often desirable
for the agent to obey safety constraints at all points in time, including
during training. We present a novel neurosymbolic approach called SPICE to
solve this safe exploration problem. SPICE uses an online shielding layer based
on symbolic weakest preconditions to achieve a more precise safety analysis
than existing tools without unduly impacting the training process. We evaluate
the approach on a suite of continuous control benchmarks and show that it can
achieve comparable performance to existing safe learning techniques while
incurring fewer safety violations. Additionally, we present theoretical results
showing that SPICE converges to the optimal safe policy under reasonable
assumptions.
( 2
min )
Inverse molecular design is critical in material science and drug discovery,
where the generated molecules should satisfy certain desirable properties. In
this paper, we propose equivariant energy-guided stochastic differential
equations (EEGSDE), a flexible framework for controllable 3D molecule
generation under the guidance of an energy function in diffusion models.
Formally, we show that EEGSDE naturally exploits the geometric symmetry in 3D
molecular conformation, as long as the energy function is invariant to
orthogonal transformations. Empirically, under the guidance of designed energy
functions, EEGSDE significantly improves the baseline on QM9, in inverse
molecular design targeted to quantum properties and molecular structures.
Furthermore, EEGSDE is able to generate molecules with multiple target
properties by combining the corresponding energy functions linearly.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
A kernel-based quantum classifier is the most practical and influential
quantum machine learning technique for the hyper-linear classification of
complex data. We propose a Variational Quantum Approximate Support Vector
Machine (VQASVM) algorithm that demonstrates empirical sub-quadratic run-time
complexity with quantum operations feasible even in NISQ computers. We
experimented our algorithm with toy example dataset on cloud-based NISQ
machines as a proof of concept. We also numerically investigated its
performance on the standard Iris flower and MNIST datasets to confirm the
practicality and scalability.
( 2
min )
We analyze a large corpus of police incident narrative documents in
understanding the spatial distribution of the topics. The motivation for doing
this is that police narratives in each incident report contains very
fine-grained information that is richer than the category that is manually
assigned by the police. Our approach is to split the corpus into topics using
two different unsupervised machine learning algorithms - Latent Dirichlet
Allocation and Non-negative Matrix Factorization. We validate the performance
of each learned topic model using model coherence. Then, using a k-nearest
neighbors density ratio estimation (kNN-DRE) approach that we propose, we
estimate the spatial density ratio per topic and use this for data discovery
and analysis of each topic, allowing for insights into the described incidents
at scale. We provide a qualitative assessment of each topic and highlight some
key benefits for using our kNN-DRE model for estimating spatial trends.
( 2
min )
In this paper, we study the generalization performance of global minima for
implementing empirical risk minimization (ERM) on over-parameterized deep ReLU
nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove
that there exist perfect global minima achieving almost optimal generalization
error bounds for numerous types of data under mild conditions. Since
over-parameterization is crucial to guarantee that the global minima of ERM on
deep ReLU nets can be realized by the widely used stochastic gradient descent
(SGD) algorithm, our results indeed fill a gap between optimization and
generalization.
( 2
min )
Fixing energy leakage caused by different anomalies can result in significant
energy savings and extended appliance life. Further, it assists grid operators
in scheduling their resources to meet the actual needs of end users, while
helping end users reduce their energy costs. In this paper, we analyze the
patterns pertaining to the power consumption of dishwashers used in two houses
of the REFIT dataset. Then two autoencoder (AEs) with 1D-CNN and TCN as
backbones are trained to differentiate the normal patterns from the abnormal
ones. Our results indicate that TCN outperforms CNN1D in detecting anomalies in
energy consumption. Finally, the data from the Fridge_Freezer and the Freezer
of house No. 3 in REFIT is also used to evaluate our approach.
( 2
min )
Audio Spectrogram Transformer models rule the field of Audio Tagging,
outrunning previously dominating Convolutional Neural Networks (CNNs). Their
superiority is based on the ability to scale up and exploit large-scale
datasets such as AudioSet. However, Transformers are demanding in terms of
model size and computational requirements compared to CNNs. We propose a
training procedure for efficient CNNs based on offline Knowledge Distillation
(KD) from high-performing yet complex transformers. The proposed training
schema and the efficient CNN design based on MobileNetV3 results in models
outperforming previous solutions in terms of parameter and computational
efficiency and prediction performance. We provide models of different
complexity levels, scaling from low-complexity models up to a new
state-of-the-art performance of .483 mAP on AudioSet. Source Code available at:
https://github.com/fschmid56/EfficientAT
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
Ridesharing platforms are a type of two-sided marketplace where
``supply-demand balance'' is critical for market efficiency and yet is complex
to define and analyze. We present a unified analytical framework based on the
graph-based equilibrium metric (GEM) for quantifying the supply-demand
spatiotemporal state and efficiency of a ridesharing marketplace. GEM was
developed as a generalized Wasserstein distance between the supply and demand
distributions in a ridesharing market and has been used as an evaluation metric
for algorithms expected to improve supply-demand alignment. Building upon GEM,
we develop SD-GEM, a dual-perspective (supply- and demand-side) representation
of rideshare market equilibrium. We show that there are often disparities
between the two views and examine how this dual-view leads to the notion of
market efficiency, in which we propose novel statistical tests for capturing
improvement and explaining the underlying driving factors.
( 2
min )
Federated Learning (FL) has emerged as a de facto machine learning area and
received rapid increasing research interests from the community. However,
catastrophic forgetting caused by data heterogeneity and partial participation
poses distinctive challenges for FL, which are detrimental to the performance.
To tackle the problems, we propose a new FL approach (namely GradMA), which
takes inspiration from continual learning to simultaneously correct the
server-side and worker-side update directions as well as take full advantage of
server's rich computing and memory resources. Furthermore, we elaborate a
memory reduction strategy to enable GradMA to accommodate FL with a large scale
of workers. We then analyze convergence of GradMA theoretically under the
smooth non-convex setting and show that its convergence rate achieves a linear
speed up w.r.t the increasing number of sampled active workers. At last, our
extensive experiments on various image classification tasks show that GradMA
achieves significant performance gains in accuracy and communication efficiency
compared to SOTA baselines.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Graph neural networks (GNNs) have been applied to a large variety of
applications in materials science and chemistry. Here, we recapitulate the
graph construction for crystalline (periodic) materials and investigate its
impact on the GNNs model performance. We suggest the asymmetric unit cell as a
representation to reduce the number of atoms by using all symmetries of the
system. With a simple but systematically built GNN architecture based on
message passing and line graph templates, we furthermore introduce a general
architecture (Nested Graph Network, NGN) that is applicable to a wide range of
tasks and systematically improves state-of-the-art results on the MatBench
benchmark datasets.
( 2
min )
This paper introduces a new sparse Bayesian learning (SBL) algorithm that
jointly recovers a temporal sequence of edge maps from noisy and under-sampled
Fourier data. The new method is cast in a Bayesian framework and uses a prior
that simultaneously incorporates intra-image information to promote sparsity in
each individual edge map with inter-image information to promote similarities
in any unchanged regions. By treating both the edges as well as the similarity
between adjacent images as random variables, there is no need to separately
form regions of change. Thus we avoid both additional computational cost as
well as any information loss resulting from pre-processing the image. Our
numerical examples demonstrate that our new method compares favorably with more
standard SBL approaches.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
We study the consequences of mode-collapse of normalizing flows in the
context of lattice field theory. Normalizing flows allow for independent
sampling. For this reason, it is hoped that they can avoid the tunneling
problem of local-update MCMC algorithms for multi-modal distributions. In this
work, we first point out that the tunneling problem is also present for
normalizing flows but is shifted from the sampling to the training phase of the
algorithm. Specifically, normalizing flows often suffer from mode-collapse for
which the training process assigns vanishingly low probability mass to relevant
modes of the physical distribution. This may result in a significant bias when
the flow is used as a sampler in a Markov-Chain or with Importance Sampling. We
propose a metric to quantify the degree of mode-collapse and derive a bound on
the resulting bias. Furthermore, we propose various mitigation strategies in
particular in the context of estimating thermodynamic observables, such as the
free energy.
( 2
min )
This study addresses the problem of performing clustering in the presence of
two types of background knowledge: pairwise constraints and monotonicity
constraints. To achieve this, the formal framework to perform clustering under
monotonicity constraints is, firstly, defined, resulting in a specific distance
measure. Pairwise constraints are integrated afterwards by designing an
objective function which combines the proposed distance measure and a pairwise
constraint-based penalty term, in order to fuse both types of information. This
objective function can be optimized with an EM optimization scheme. The
proposed method serves as the first approach to the problem it addresses, as it
is the first method designed to work with the two types of background knowledge
mentioned above. Our proposal is tested in a variety of benchmark datasets and
in a real-world case of study.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
We introduce a new methodology dubbed ``safe peeling'' to accelerate the
resolution of l0-regularized least-squares problems via a Branch-and-Bound
(BnB) method. Our procedure enables to tighten the convex relaxation considered
at each node of the BnB decision tree and therefore potentially allows for more
aggressive pruning. Numerical simulations show that our proposed methodology
leads to significant gains in terms of number of nodes explored and overall
solving time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Bayesian experimental design (BED) provides a powerful and general framework
for optimizing the design of experiments. However, its deployment often poses
substantial computational challenges that can undermine its practical use. In
this review, we outline how recent advances have transformed our ability to
overcome these challenges and thus utilize BED effectively, before discussing
some key areas for future development in the field.
( 2
min )
We consider the problem of tracking an unknown time varying parameter that
characterizes the probabilistic evolution of a sequence of independent
observations. To this aim, we propose a stochastic gradient descent-based
recursive scheme in which the log-likelihood of the observations acts as time
varying gain function. We prove convergence in mean-square error in a suitable
neighbourhood of the unknown time varying parameter and illustrate the details
of our findings in the case where data are generated from distributions
belonging to the exponential family.
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
In an effort to address the training instabilities of GANs, we introduce a
class of dual-objective GANs with different value functions (objectives) for
the generator (G) and discriminator (D). In particular, we model each objective
using $\alpha$-loss, a tunable classification loss, to obtain
$(\alpha_D,\alpha_G)$-GANs, parameterized by $(\alpha_D,\alpha_G)\in
[0,\infty)^2$. For sufficiently large number of samples and capacities for G
and D, we show that the resulting non-zero sum game simplifies to minimizing an
$f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. In the
finite sample and capacity setting, we define estimation error to quantify the
gap in the generator's performance relative to the optimal setting with
infinite samples and obtain upper bounds on this error, showing it to be order
optimal under certain conditions. Finally, we highlight the value of tuning
$(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic
2D Gaussian mixture ring and the Stacked MNIST datasets.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
Forest-based methods have recently gained in popularity for non-parametric
treatment effect estimation. Building on this line of work, we introduce causal
survival forests, which can be used to estimate heterogeneous treatment effects
in a survival and observational setting where outcomes may be right-censored.
Our approach relies on orthogonal estimating equations to robustly adjust for
both censoring and selection effects under unconfoundedness. In our
experiments, we find our approach to perform well relative to a number of
baselines.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
Kernel methods, being supported by a well-developed theory and coming with
efficient algorithms, are among the most popular and successful machine
learning techniques. From a mathematical point of view, these methods rest on
the concept of kernels and function spaces generated by kernels, so called
reproducing kernel Hilbert spaces. Motivated by recent developments of learning
approaches in the context of interacting particle systems, we investigate
kernel methods acting on data with many measurement variables. We show the
rigorous mean field limit of kernels and provide a detailed analysis of the
limiting reproducing kernel Hilbert space. Furthermore, several examples of
kernels, that allow a rigorous mean field limit, are presented.
( 2
min )
Semi-supervised learning aims to train a model using limited labels.
State-of-the-art semi-supervised methods for image classification such as PAWS
rely on self-supervised representations learned with large-scale unlabeled but
curated data. However, PAWS is often less effective when using real-world
unlabeled data that is uncurated, e.g., contains out-of-class data. We propose
RoPAWS, a robust extension of PAWS that can work with real-world unlabeled
data. We first reinterpret PAWS as a generative classifier that models
densities using kernel density estimation. From this probabilistic perspective,
we calibrate its prediction based on the densities of labeled and unlabeled
data, which leads to a simple closed-form solution from the Bayes' rule. We
demonstrate that RoPAWS significantly improves PAWS for uncurated Semi-iNat by
+5.3% and curated ImageNet by +0.4%.
( 2
min )
Partitioning a set of elements into subsets of a priori unknown sizes is
essential in many applications. These subset sizes are rarely explicitly
learned - be it the cluster sizes in clustering applications or the number of
shared versus independent generative latent factors in weakly-supervised
learning. Probability distributions over correct combinations of subset sizes
are non-differentiable due to hard constraints, which prohibit gradient-based
optimization. In this work, we propose the differentiable hypergeometric
distribution. The hypergeometric distribution models the probability of
different group sizes based on their relative importance. We introduce
reparameterizable gradients to learn the importance between groups and
highlight the advantage of explicitly learning the size of subsets in two
typical applications: weakly-supervised learning and clustering. In both
applications, we outperform previous approaches, which rely on suboptimal
heuristics to model the unknown size of groups.
( 2
min )
The most recent multi-source covariate shift algorithm is an efficient
hyperparameter optimization algorithm for missing target output. In this paper,
we extend this algorithm to the framework of federated learning. For data
islands in federated learning and covariate shift adaptation, we propose the
federated domain adaptation estimate of the target risk which is asymptotically
unbiased with a desirable asymptotic variance property. We construct a weighted
model for the target task and propose the federated covariate shift adaptation
algorithm which works preferably in our setting. The efficacy of our method is
justified both theoretically and empirically.
( 2
min )
This paper introduces a new framework of algebraic equivalence relations
between time series and new distance metrics between them, then applies these
to investigate the Australian ``Black Summer'' bushfire season of 2019-2020.
First, we introduce a general framework for defining equivalence between time
series, heuristically intended to be equivalent if they differ only up to
noise. Our first specific implementation is based on using change point
algorithms and comparing statistical quantities such as mean or variance in
stationary segments. We thus derive the existence of such equivalence relations
on the space of time series, such that the quotient spaces can be equipped with
a metrizable topology. Next, we illustrate specifically how to define and
compute such distances among a collection of time series and perform clustering
and additional analysis thereon. Then, we apply these insights to analyze air
quality data across New South Wales, Australia, during the 2019-2020 bushfires.
There, we investigate structural similarity with respect to this data and
identify locations that were impacted anonymously by the fires relative to
their location. This may have implications regarding the appropriate management
of resources to avoid gaps in the defense against future fires.
( 2
min )
Traffic systems can operate in different modes. In a previous work, we
identified these modes as different quasi-stationary states in the correlation
structure. Here, we analyze the transitions between such quasi-stationary
states, i.e., how the system changes its operational mode. In the longer run
this might be helpful to forecast the time evolution of correlation patterns in
traffic. We take Cologne orbital motorways as an example, we construct a state
transition network for each quarter of 2015 and find a seasonal dependence for
those quasi-stationary states in the traffic system. Using the PageRank
algorithm, we identify and explore the dominant states which occur frequently
within a moving time window of 60 days in 2015. To the best of our knowledge,
this is the first study of this type for traffic systems.
( 2
min )
Clustering is a widely used technique with a long and rich history in a
variety of areas. However, most existing algorithms do not scale well to large
datasets, or are missing theoretical guarantees of convergence. This paper
introduces a provably robust clustering algorithm based on loss minimization
that performs well on Gaussian mixture models with outliers. It provides
theoretical guarantees that the algorithm obtains high accuracy with high
probability under certain assumptions. Moreover, it can also be used as an
initialization strategy for $k$-means clustering. Experiments on real-world
large-scale datasets demonstrate the effectiveness of the algorithm when
clustering a large number of clusters, and a $k$-means algorithm initialized by
the algorithm outperforms many of the classic clustering methods in both speed
and accuracy, while scaling well to large datasets such as ImageNet.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
submitted by /u/Adunaiii
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/sediba-edud-eht
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 41
min )
submitted by /u/DevOpsMuffin39
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/chronck
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
It would be something similar to mnist-ready (https://github.com/saoj/mnist-ready) in Ruby, but in Python. See below:
digit = MNIST.all_set[0] # first one # An integer corresponding to the digit of the image puts digit.label # => 7 # The pixels is an one-dimension array of 784 (28 x 28) pixel values from 0 to 255 puts digit.pixels.size # => 784 puts digit.pixels.inspect # => [0, 0, 0, 0, ...
It has this nice feature which allows you to see the digits:
puts digit.ascii_image ____________________________ | 7 | |----------------------------| | | | }wJY+I | | #$$$$%ddddddddQ> | | -f?fCM$M$$$$W$$c | | _^---~"8$/ | | }$h | | "&$} | | n$8! | | ~@$+ | | u$w. | | `k@~ | | x$m | | ]$%~ | | #$L | | .k$*I | | l$$] | | ;#$f | | u$$> | | +%$$> | | r$$*l | | r$h | |____________________________|
submitted by /u/niosurfer
[link] [comments]
( 43
min )
Hi everyone. Now ChatRWKV v2 can split RWKV to multiple GPUs, or stream layers (compute layer-by-layer), so you can run RWKV 14B with as few as 3G VRAM. https://github.com/BlinkDL/ChatRWKV
Example:
'cuda:0 fp16 *10 -> cuda:1 fp16 *8 -> cpu fp32' = first 10 layers on cuda:0 fp16, then 8 layers on cuda:1 fp16, then on cpu fp32
'cuda fp16 *20+' = first 20 layers on cuda fp16, then stream the rest on it
And RWKV is now a pip package: https://pypi.org/project/rwkv/
os.environ['RWKV_JIT_ON'] = '1' os.environ["RWKV_CUDA_ON"] = '0' # if '1' then compile CUDA kernel for seq mode (much faster) from rwkv.model import RWKV from rwkv.utils import PIPELINE, PIPELINE_ARGS pipeline = PIPELINE(model, "20B_tokenizer.json") # find it in https://github.com/BlinkDL/ChatRWKV # download models: https://hugg…
( 45
min )
The fashion industry is a highly lucrative business, with an estimated value of $2.1 trillion by 2025, as reported by the World Bank. This field encompasses a diverse range of segments, such as the creation, manufacture, distribution, and sales of clothing, shoes, and accessories. The industry is in a constant state of change, with new […]
( 15
min )
This post is co-written with Suhyoung Kim, General Manager at KakaoGames Data Analytics Lab. Kakao Games is a top video game publisher and developer headquartered in South Korea. It specializes in developing and publishing games on PC, mobile, and virtual reality (VR) serving globally. In order to maximize its players’ experience and improve the efficiency […]
( 14
min )
Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. The ability to train custom models through the Custom classification and Custom entity […]
( 10
min )
The world we live in is rapidly changing, and so are the data and features that companies and customers use to train their models. Retraining models to keep them in sync with these changes is critical to maintain accuracy. Therefore, you need an agile and dynamic approach to keep models up to date and adapt […]
( 10
min )
submitted by /u/Yasiru92
[link] [comments]
( 41
min )
submitted by /u/FettyZ
[link] [comments]
( 42
min )
The quest for knowledge at work can feel like searching for a needle in a haystack. But what if the haystack itself could reveal where the needle is? That’s the promise of large language models, or LLMs, the subject of this week’s episode of the NVIDIA AI Podcast featuring Deedy Das and Eddie Zhou, founding Read article >
( 5
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
submitted by /u/virtual_transject
[link] [comments]
( 41
min )
submitted by /u/pyactee
[link] [comments]
( 41
min )
Please provide feedback so I can make it better and help the AI movement.
aitoptools.com
submitted by /u/aitoptools
[link] [comments]
( 41
min )
Developers can now integrate ChatGPT and Whisper models into their apps and products through our API.
( 5
min )
submitted by /u/turtlepajama
[link] [comments]
( 42
min )
submitted by /u/TemplarTV
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/ElonJuniorMusk
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/Fusemachines_1
[link] [comments]
( 41
min )
submitted by /u/grahammiranda13
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 42
min )
submitted by /u/davinci-code
[link] [comments]
( 41
min )
Announcements Are Generative Adversarial Networks Really Useful? Such a question may seem as coming from a dinosaur, adverse to change. Or from someone selling traditional methods and badmouthing anything that feels threatening to his business. This is not the case here: I always try to stay neutral, and usually – while typically not a first… Read More »DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful?
The post DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful? appeared first on Data Science Central.
( 21
min )
Back in 2018, I had the privilege of keynoting at one of Semantic Web Company’s events in Vienna, as well as attending the full event. It was a great opportunity to immerse myself in the Central European perspective on the utility of Linked Open Data standards and how those standards were being applied. I got… Read More »FAIR Content: Better Chatbot Answers and Content Reusability at Scale
The post FAIR Content: Better Chatbot Answers and Content Reusability at Scale appeared first on Data Science Central.
( 21
min )
submitted by /u/shani_786
[link] [comments]
( 41
min )
In today’s highly competitive market, performing data analytics using machine learning (ML) models has become a necessity for organizations. It enables them to unlock the value of their data, identify trends, patterns, and predictions, and differentiate themselves from their competitors. For example, in the healthcare industry, ML-driven analytics can be used for diagnostic assistance and […]
( 12
min )
Fraud detection is an important problem that has applications in financial services, social media, ecommerce, gaming, and other industries. This post presents an implementation of a fraud detection solution using the Relational Graph Convolutional Network (RGCN) model to predict the probability that a transaction is fraudulent through both the transductive and inductive inference modes. You can deploy our implementation to an Amazon SageMaker endpoint as a real-time fraud detection solution, without requiring external graph storage or orchestration, thereby significantly reducing the deployment cost of the model.
( 11
min )
As the meteoric rise of ChatGPT demonstrates, generative AI can unlock enormous potential for companies, teams and individuals. Whether simplifying time-consuming tasks or accelerating 3D workflows to boost creativity and productivity, generative AI is already making an impact across industries — and there’s much more to come. How generative AI is paving the way for Read article >
( 5
min )
Brian Spears says his children will enjoy a more sustainable planet, thanks in part to AI and high performance computing (HPC) simulations. “I believe I’ll see fusion energy in my lifetime, and I’m confident my daughters will see a fusion-powered world,” said the 45-year-old principal investigator at Lawrence Livermore National Laboratory who helped demonstrate the Read article >
( 6
min )
ManvsMachine steps In the NVIDIA Studio this week to share insights behind fractal art — which uses algorithms to artistically represent calculations — derived from geometric objects as digital images and animations.
( 6
min )
Streaming video on PCs through Google Chrome and Microsoft Edge browsers is getting a GeForce RTX-sized upgrade today with the release of RTX Video Super Resolution (VSR). Nearly 80% of internet bandwidth today is streaming video. And 90% of that content streams at 1080p or lower, including from popular sources like Twitch.tv, YouTube, Netflix, Disney+ Read article >
( 6
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
In this work, we propose a self-improving artificial intelligence system to
enhance the safety performance of reinforcement learning (RL)-based autonomous
driving (AD) agents using black-box verification methods. RL algorithms have
become popular in AD applications in recent years. However, the performance of
existing RL algorithms heavily depends on the diversity of training scenarios.
A lack of safety-critical scenarios during the training phase could result in
poor generalization performance in real-world driving applications. We propose
a novel framework in which the weaknesses of the training set are explored
through black-box verification methods. After discovering AD failure scenarios,
the RL agent's training is re-initiated via transfer learning to improve the
performance of previously unsafe scenarios. Simulation results demonstrate that
our approach efficiently discovers safety failures of action decisions in
RL-based adaptive cruise control (ACC) applications and significantly reduces
the number of vehicle collisions through iterative applications of our method.
The source code is publicly available at
https://github.com/data-and-decision-lab/self-improving-RL.
( 2
min )
In the end-of-line test of geared motors, the evaluation of product qual-ity
is important. Due to time constraints and the high diversity of variants,
acous-tic measurements are more economical than vibration measurements.
However, the acoustic data is affected by industrial disturbing noise.
Therefore, the aim of this study is to investigate the robustness of features
used for anomaly detection in geared motor end-of-line testing. A real-world
dataset with typical faults and acoustic disturbances is recorded by an
acoustic array. This includes industrial noise from the production and
systematically produced disturbances, used to compare the robustness. Overall,
it is proposed to apply features extracted from a log-envelope spectrum
together with psychoacoustic features. The anomaly de-tection is done by using
the isolation forest or the more universal bagging random miner. Most
disturbances can be circumvented, while the use of a hammer or air pressure
often causes problems. In general, these results are important for condi-tion
monitoring tasks that are based on acoustic or vibration measurements.
Fur-thermore, a real-world problem description is presented to improve common
sig-nal processing and machine learning tasks.
( 2
min )
The recent literature on online learning to rank (LTR) has established the
utility of prior knowledge to Bayesian ranking bandit algorithms. However, a
major limitation of existing work is the requirement for the prior used by the
algorithm to match the true prior. In this paper, we propose and analyze
adaptive algorithms that address this issue and additionally extend these
results to the linear and generalized linear models. We also consider scalar
relevance feedback on top of click feedback. Moreover, we demonstrate the
efficacy of our algorithms using both synthetic and real-world experiments.
( 2
min )
Research on deep reinforcement learning (DRL) based production scheduling
(PS) has gained a lot of attention in recent years, primarily due to the high
demand for optimizing scheduling problems in diverse industry settings.
Numerous studies are carried out and published as stand-alone experiments that
often vary only slightly with respect to problem setups and solution
approaches. The programmatic core of these experiments is typically very
similar. Despite this fact, no standardized and resilient framework for
experimentation on PS problems with DRL algorithms could be established so far.
In this paper, we introduce schlably, a Python-based framework that provides
researchers a comprehensive toolset to facilitate the development of PS
solution strategies based on DRL. schlably eliminates the redundant overhead
work that the creation of a sturdy and flexible backbone requires and increases
the comparability and reusability of conducted research work.
( 2
min )
Distributed deep learning (DDL) systems strongly depend on network
performance. Current electronic packet switched (EPS) network architectures and
technologies suffer from variable diameter topologies, low-bisection bandwidth
and over-subscription affecting completion time of communication and collective
operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all,
single-hop, all-optical network architecture with nanosecond reconfiguration
called RAMP, which supports large-scale distributed and parallel computing
systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is
proposed to run MPI collective operations across the optical circuit switched
(OCS) network in a schedule-less and contention-less manner. RAMP achieves
7.6-171$\times$ speed-up in completion time across all MPI operations compared
to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and
7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while
offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption
and cost respectively.
( 2
min )
In the context of keyword spotting (KWS), the replacement of handcrafted
speech features by learnable features has not yielded superior KWS performance.
In this study, we demonstrate that filterbank learning outperforms handcrafted
speech features for KWS whenever the number of filterbank channels is severely
decreased. Reducing the number of channels might yield certain KWS performance
drop, but also a substantial energy consumption reduction, which is key when
deploying common always-on KWS on low-resource devices. Experimental results on
a noisy version of the Google Speech Commands Dataset show that filterbank
learning adapts to noise characteristics to provide a higher degree of
robustness to noise, especially when dropout is integrated. Thus, switching
from typically used 40-channel log-Mel features to 8-channel learned features
leads to a relative KWS accuracy loss of only 3.5% while simultaneously
achieving a 6.3x energy consumption reduction.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
In this paper, we study first-order algorithms for solving fully composite
optimization problems over bounded sets. We treat the differentiable and
non-differentiable parts of the objective separately, linearizing only the
smooth components. This provides us with new generalizations of the classical
and accelerated Frank-Wolfe methods, that are applicable to non-differentiable
problems whenever we can access the structure of the objective. We prove global
complexity bounds for our algorithms that are optimal in several settings.
( 2
min )
This paper describes our participation in SemEval-2023 Task 9, Intimacy
Analysis of Multilingual Tweets. We fine-tune some of the most popular
transformer models with the training dataset and synthetic data generated by
different data augmentation techniques. During the development phase, our best
results were obtained by using XLM-T. Data augmentation techniques provide a
very slight improvement in the results. Our system ranked in the 27th position
out of the 45 participating systems. Despite its modest results, our system
shows promising results in languages such as Portuguese, English, and Dutch.
All our code is available in the repository
\url{https://github.com/isegura/hulat_intimacy}.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this study, we validate the findings of previously published papers,
showing the feasibility of an Electroencephalography (EEG) based gaze
estimation. Moreover, we extend previous research by demonstrating that with
only a slight drop in model performance, we can significantly reduce the number
of electrodes, indicating that a high-density, expensive EEG cap is not
necessary for the purposes of EEG-based eye tracking. Using data-driven
approaches, we establish which electrode clusters impact gaze estimation and
how the different types of EEG data preprocessing affect the models'
performance. Finally, we also inspect which recorded frequencies are most
important for the defined tasks.
( 2
min )
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learnt by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of the high-fidelity information, using it for building the
reduced order model and for learning the residual. In this work we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented.
( 2
min )
Federated learning (FL) was originally regarded as a framework for
collaborative learning among clients with data privacy protection through a
coordinating server. In this paper, we propose a new active membership
inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks,
the server crafts and embeds malicious parameters into global models to
effectively infer whether a target data sample is included in a client's
private training data or not. By exploiting the correlation among data features
through a non-linear decision boundary, AMI attacks with a certified guarantee
of success can achieve severely high success rates under rigorous local
differential privacy (LDP) protection; thereby exposing clients' training data
to significant privacy risk. Theoretical and experimental results on several
benchmark datasets show that adding sufficient privacy-preserving noise to
prevent our attack would significantly damage FL's model utility.
( 2
min )
Accurate and real-time traffic state prediction is of great practical
importance for urban traffic control and web mapping services (e.g. Google
Maps). With the support of massive data, deep learning methods have shown their
powerful capability in capturing the complex spatio-temporal patterns of road
networks. However, existing approaches use independent components to model
temporal and spatial dependencies and thus ignore the heterogeneous
characteristics of traffic flow that vary with time and space. In this paper,
we propose a novel dynamic graph convolution network with spatio-temporal
attention fusion. The method not only captures local spatio-temporal
information that changes over time, but also comprehensively models
long-distance and multi-scale spatio-temporal patterns based on the fusion
mechanism of temporal and spatial attention. This design idea can greatly
improve the spatio-temporal perception of the model. We conduct extensive
experiments in 4 real-world datasets to demonstrate that our model achieves
state-of-the-art performance compared to 22 baseline models.
( 2
min )
To address the problem of medical image recognition, computer vision
techniques like convolutional neural networks (CNN) are frequently used.
Recently, 3D CNN-based models dominate the field of magnetic resonance image
(MRI) analytics. Due to the high similarity between MRI data and videos, we
conduct extensive empirical studies on video recognition techniques for MRI
classification to answer the questions: (1) can we directly use video
recognition models for MRI classification, (2) which model is more appropriate
for MRI, (3) are the common tricks like data augmentation in video recognition
still useful for MRI classification? Our work suggests that advanced video
techniques benefit MRI classification. In this paper, four datasets of
Alzheimer's and Parkinson's disease recognition are utilized in experiments,
together with three alternative video recognition models and data augmentation
techniques that are frequently applied to video tasks. In terms of efficiency,
the results reveal that the video framework performs better than 3D-CNN models
by 5% - 11% with 50% - 66% less trainable parameters. This report pushes
forward the potential fusion of 3D medical imaging and video understanding
research.
( 2
min )
Despite the major progress of deep models as learning machines, uncertainty
estimation remains a major challenge. Existing solutions rely on modified loss
functions or architectural changes. We propose to compensate for the lack of
built-in uncertainty estimates by supplementing any network, retrospectively,
with a subsequent vine copula model, in an overall compound we call Vine-Copula
Neural Network (VCNN). Through synthetic and real-data experiments, we show
that VCNNs could be task (regression/classification) and architecture
(recurrent, fully connected) agnostic while providing reliable and
better-calibrated uncertainty estimates, comparable to state-of-the-art
built-in uncertainty solutions.
( 2
min )
This paper presents a novel approach for multimodal data fusion based on the
Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed
method is simple yet effective in achieving excellent reconstruction
performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally,
the multimodal VQVAE model is extended to the 5G communication scenario, where
an end-to-end Channel State Information (CSI) feedback system is implemented to
compress data transmitted between the base-station (eNodeB) and User Equipment
(UE), without significant loss of performance. The proposed model learns a
discriminative compressed feature space for various types of input data (CSI,
spectrograms, natural images, etc), making it a suitable solution for
applications with limited computational resources.
( 2
min )
To accelerate the inference of deep neural networks (DNNs), quantization with
low-bitwidth numbers is actively researched. A prominent challenge is to
quantize the DNN models into low-bitwidth numbers without significant accuracy
degradation, especially at very low bitwidths (< 8 bits). This work targets an
adaptive data representation with variable-length encoding called DyBit. DyBit
can dynamically adjust the precision and range of separate bit-field to be
adapted to the DNN weights/activations distribution. We also propose a
hardware-aware quantization framework with a mixed-precision accelerator to
trade-off the inference accuracy and speedup. Experimental results demonstrate
that the inference accuracy via DyBit is 1.997% higher than the
state-of-the-art at 4-bit quantization, and the proposed framework can achieve
up to 8.1x speedup compared with the original model.
( 2
min )
We study differentially private (DP) machine learning algorithms as instances
of noisy fixed-point iterations, in order to derive privacy and utility results
from this well-studied framework. We show that this new perspective recovers
popular private gradient-based methods like DP-SGD and provides a principled
way to design and analyze new private optimization algorithms in a flexible
manner. Focusing on the widely-used Alternating Directions Method of
Multipliers (ADMM) method, we use our general framework to derive novel private
ADMM algorithms for centralized, federated and fully decentralized learning.
For these three algorithms, we establish strong privacy guarantees leveraging
privacy amplification by iteration and by subsampling. Finally, we provide
utility guarantees using a unified analysis that exploits a recent linear
convergence result for noisy fixed-point iterations.
( 2
min )
Recent advancements in interpretability research made transformer language
models more transparent. This progress led to a better understanding of their
inner workings for toy and naturally occurring models. However, how these
models internally process sentiment changes has yet to be sufficiently
answered. In this work, we introduce a new interpretability tool called PCP
ablation, where we replace modules with low-rank matrices based on the
principal components of their activations, reducing model parameters and their
behavior to essentials. We demonstrate PCP ablations on MLP and attention
layers in backdoored toy, backdoored large, and naturally occurring models. We
determine MLPs as most important for the backdoor mechanism and use this
knowledge to remove, insert, and modify backdoor mechanisms with engineered
replacements via PCP ablation.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Morphological atlases are an important tool in organismal studies, and modern
high-throughput Computed Tomography (CT) facilities can produce hundreds of
full-body high-resolution volumetric images of organisms. However, creating an
atlas from these volumes requires accurate organ segmentation. In the last
decade, machine learning approaches have achieved incredible results in image
segmentation tasks, but they require large amounts of annotated data for
training. In this paper, we propose a self-training framework for multi-organ
segmentation in tomographic images of Medaka fish. We utilize the
pseudo-labeled data from a pretrained Teacher model and adopt a Quality
Classifier to refine the pseudo-labeled data. Then, we introduce a pixel-wise
knowledge distillation method to prevent overfitting to the pseudo-labeled data
and improve the segmentation performance. The experimental results demonstrate
that our method improves mean Intersection over Union (IoU) by 5.9% on the full
dataset and enables keeping the quality while using three times less markup.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Previous pitch-controllable text-to-speech (TTS) models rely on directly
modeling fundamental frequency, leading to low variance in synthesized speech.
To address this issue, we propose PITS, an end-to-end pitch-controllable TTS
model that utilizes variational inference to model pitch. Based on VITS, PITS
incorporates the Yingram encoder, the Yingram decoder, and adversarial training
of pitch-shifted synthesis to achieve pitch-controllability. Experiments
demonstrate that PITS generates high-quality speech that is indistinguishable
from ground truth speech and has high pitch-controllability without quality
degradation. Code and audio samples will be available at
https://github.com/anonymous-pits/pits.
( 2
min )
Effectively scaling large Transformer models is a main driver of recent
advances in natural language processing. Dynamic neural networks, as an
emerging research direction, are capable of scaling up neural networks with
sub-linear increases in computation and time by dynamically adjusting their
computational path based on the input. Dynamic neural networks could be a
promising solution to the growing parameter numbers of pretrained language
models, allowing both model pretraining with trillions of parameters and faster
inference on mobile devices. In this survey, we summarize progress of three
types of dynamic neural networks in NLP: skimming, mixture of experts, and
early exit. We also highlight current challenges in dynamic neural networks and
directions for future research.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
Traffic prediction is a flourishing research field due to its importance in
human mobility in the urban space. Despite this, existing studies only focus on
short-term prediction of up to few hours in advance, with most being up to one
hour only. Long-term traffic prediction can enable more comprehensive,
informed, and proactive measures against traffic congestion and is therefore an
important task to explore. In this paper, we explore the task of long-term
traffic prediction; where we predict traffic up to 24 hours in advance. We note
the weaknesses of existing models--which are based on recurrent structures--for
long-term traffic prediction and propose a modified Transformer model
``TrafFormer". Experiments comparing our model with existing hybrid neural
network models show the superiority of our model.
( 2
min )
In sponsored search advertising (SSA), keywords serve as the basic unit of
business model, linking three stakeholders: consumers, advertisers and search
engines. This paper presents an overarching framework for keyword decisions
that highlights the touchpoints in search advertising management, including
four levels of keyword decisions, i.e., domain-specific keyword pool
generation, keyword targeting, keyword assignment and grouping, and keyword
adjustment. Using this framework, we review the state-of-the-art research
literature on keyword decisions with respect to techniques, input features and
evaluation metrics. Finally, we discuss evolving issues and identify potential
gaps that exist in the literature and outline novel research perspectives for
future exploration.
( 2
min )
The cosmic microwave background (CMB) is a significant source of knowledge
about the origin and evolution of our universe. However, observations of the
CMB are contaminated by foreground emissions, obscuring the CMB signal and
reducing its efficacy in constraining cosmological parameters. We employ deep
learning as a data-driven approach to CMB cleaning from multi-frequency
full-sky maps. In particular, we develop a graph-based Bayesian convolutional
neural network based on the U-Net architecture that predicts cleaned CMB with
pixel-wise uncertainty estimates. We demonstrate the potential of this
technique on realistic simulated data based on the Planck mission. We show that
our model accurately recovers the cleaned CMB sky map and resulting angular
power spectrum while identifying regions of uncertainty. Finally, we discuss
the current challenges and the path forward for deploying our model for CMB
recovery on real observations.
( 2
min )
Modelling stockpile is a key factor of a project economic and operation in
mining, because not all the mined ores are not able to mill for many reasons.
Further, the financial value of the ore in the stockpile needs to be reflected
on the balance sheet. Therefore, automatically tracking the frontiers of the
stockpile facilitates the mine scheduling engineers to calculate the tonnage of
the ore remaining in the stockpile. This paper suggests how the dynamic of
stockpile shape changes caused by dumping and reclaiming operations can be
inferred using polygon models. The presented work also demonstrates how the
geometry of stockpiles can be inferred in the absence of reclaimed bucket
information, in which case the reclaim polygons are established using the
diggers GPS positional data at the time of truck loading. This work further
compares two polygon models for creating 2D shapes.
( 2
min )
Fast model updates for unseen tasks on intelligent edge devices are crucial
but also challenging due to the limited computational power. In this paper,we
propose MetaLDC, which meta-trains braininspired ultra-efficient
low-dimensional computing classifiers to enable fast adaptation on tiny devices
with minimal computational costs. Concretely, during the meta-training stage,
MetaLDC meta trains a representation offline by explicitly taking into account
that the final (binary) class layer will be fine-tuned for fast adaptation for
unseen tasks on tiny devices; during the meta-testing stage, MetaLDC uses
closed-form gradients of the loss function to enable fast adaptation of the
class layer. Unlike traditional neural networks, MetaLDC is designed based on
the emerging LDC framework to enable ultra-efficient on-device inference. Our
experiments have demonstrated that compared to SOTA baselines, MetaLDC achieves
higher accuracy, robustness against random bit errors, as well as
cost-efficient hardware computation.
( 2
min )
Since its introduction in 2017, physics-informed deep learning (PIDL) has
garnered growing popularity in understanding the evolution of systems governed
by physical laws in terms of partial differential equations (PDEs). However,
empirical evidence points to the limitations of PIDL for learning certain types
of PDEs. In this paper, we (a) present the challenges in training PIDL
architecture, (b) contrast the performance of PIDL architecture in learning a
first order scalar hyperbolic conservation law and its parabolic counterpart,
(c) investigate the effect of training data sampling, which corresponds to
various sensing scenarios in traffic networks, (d) comment on the implications
of PIDL limitations for traffic flow estimation and prediction in practice.
Detailed in the case study, we present the contradistinction in PIDL results
between learning the traffic flow model (LWR PDE) and its variation with
diffusion. The outcome indicates that PIDL experiences significant challenges
in learning the hyperbolic LWR equation due to the non-smoothness of its
solution. On the other hand, the architecture with parabolic PDE, augmented
with the diffusion term, leads to the successful reassembly of the density data
even with the shockwaves present.
( 2
min )
Federated learning (FL) is a popular technique for training a global model on
data distributed across client devices. Like other distributed training
techniques, FL is susceptible to straggler (slower or failed) clients. Recent
work has proposed to address this through device-to-device (D2D) offloading,
which introduces privacy concerns. In this paper, we propose a novel
straggler-optimal approach for coded matrix computations which can
significantly reduce the communication delay and privacy issues introduced from
D2D data transmissions in FL. Moreover, our proposed approach leads to a
considerable improvement of the local computation speed when the generated data
matrix is sparse. Numerical evaluations confirm the superiority of our proposed
method over baseline approaches.
( 2
min )
We revisit the original approach of using deep learning and neural networks
to solve differential equations by incorporating the knowledge of the equation.
This is done by adding a dedicated term to the loss function during the
optimization procedure in the training process. The so-called physics-informed
neural networks (PINNs) are tested on a variety of academic ordinary
differential equations in order to highlight the benefits and drawbacks of this
approach with respect to standard integration methods. We focus on the
possibility to use the least possible amount of data into the training process.
The principles of PINNs for solving differential equations by enforcing
physical laws via penalizing terms are reviewed. A tutorial on a simple
equation model illustrates how to put into practice the method for ordinary
differential equations. Benchmark tests show that a very small amount of
training data is sufficient to predict the solution when the non linearity of
the problem is weak. However, this is not the case in strongly non linear
problems where a priori knowledge of training data over some partial or the
whole time integration interval is necessary.
( 2
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
Bayesian additive regression trees (BART) is a semi-parametric regression
model offering state-of-the-art performance on out-of-sample prediction.
Despite this success, standard implementations of BART typically provide
inaccurate prediction and overly narrow prediction intervals at points outside
the range of the training data. This paper proposes a novel extrapolation
strategy that grafts Gaussian processes to the leaf nodes in BART for
predicting points outside the range of the observed data. The new method is
compared to standard BART implementations and recent frequentist
resampling-based methods for predictive inference. We apply the new approach to
a challenging problem from causal inference, wherein for some regions of
predictor space, only treated or untreated units are observed (but not both).
In simulation studies, the new approach boasts superior performance compared to
popular alternatives, such as Jackknife+.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this paper, we introduce two methods to solve the American-style option
pricing problem and its dual form at the same time using neural networks.
Without applying nested Monte Carlo, the first method uses a series of neural
networks to simultaneously compute both the lower and upper bounds of the
option price, and the second one accomplishes the same goal with one global
network. The avoidance of extra simulations and the use of neural networks
significantly reduce the computational complexity and allow us to price
Bermudan options with frequent exercise opportunities in high dimensions, as
illustrated by the provided numerical experiments. As a by-product, these
methods also derive a hedging strategy for the option, which can also be used
as a control variate for variance reduction.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
A Shared Nearest Neighbor (SNN) graph is a type of graph construction using
shared nearest neighbor information, which is a secondary similarity measure
based on the rankings induced by a primary $k$-nearest neighbor ($k$-NN)
measure. SNN measures have been touted as being less prone to the curse of
dimensionality than conventional distance measures, and thus methods using SNN
graphs have been widely used in applications, particularly in clustering
high-dimensional data sets and in finding outliers in subspaces of high
dimensional data. Despite this, the theoretical study of SNN graphs and graph
Laplacians remains unexplored. In this pioneering work, we make the first
contribution in this direction. We show that large scale asymptotics of an SNN
graph Laplacian reach a consistent continuum limit; this limit is the same as
that of a $k$-NN graph Laplacian. Moreover, we show that the pointwise
convergence rate of the graph Laplacian is linear with respect to $(k/n)^{1/m}$
with high probability.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Hi everyone, I'm doing a personal project about what people think about music generating AIs. It will be very helpful if you take your time to do this survey. It will take about 5 minutes. Thank you so much for your participation.
https://docs.google.com/forms/d/e/1FAIpQLSfLHjRaWAsdGrK6Zn8X-CW17Vjn0W8EJEwEflnX7ucWn2eGBA/viewform?usp=pp_url
submitted by /u/KindlyGuess419
[link] [comments]
( 41
min )
Microsoft hooks ChatGPT up to a robot, NVIDIA promises to improve AI performance 1 million times over the next decade, AWS hugs Hugging Face, ControlNet takes image generation by storm, and more -
https://scottswigart.substack.com/p/whats-new-in-generative-ai-2023-02
submitted by /u/smswigart
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 41
min )
submitted by /u/yikeshardware
[link] [comments]
( 42
min )
submitted by /u/trcytony
[link] [comments]
( 41
min )
submitted by /u/citidotio
[link] [comments]
( 41
min )
submitted by /u/AlternativeFee1
[link] [comments]
( 41
min )
Meet the Google for Startups Accelerator Canada class of 2023!
Bidmii is an online marketplace that quickly connects homeowners and contractors for home improvement projects, guaranteeing payment security for each party by holding payments in trust.
Chimoney enables businesses to send payments to phones, emails and Twitter, regardless of scale, currency, country and other factors.
Clavis Studio is an AI and machine learning (ML)-driven design, visualization, and sourcing platform that provides a marketplace for designers and decorators to source new clients and use supporting tools to deliver their projects.
Foqus Technologies is an AI and quantitative imaging technology company that designs and develops software solutions to enhance the speed and quality of MRI scans.
Gryd Digital …
( 43
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
submitted by /u/rtwalz
[link] [comments]
( 42
min )
submitted by /u/Your_bad_sins
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kg_from_ct
[link] [comments]
( 43
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
We have ancient biology, medieval institutions, and we are approaching godlike technology. There are so many nightmares that could play out and we have to be conscious of them at all times. Setting up AI systems correctly and ensuring that our rulers are responsible is the number one priority. But what happens if we do manage to retain control and agency?
If humanity can pull this off, then perhaps we can begin to imagine the incredible potential that awaits us. We are about to be the human beings that get to live through this incredible and most crucial period. What more incredible and meaningful time could there be, than getting to see and be a part of the potential transformation of our species?
https://youtu.be/TQ36hkxIx74
This video explores the concepts postulated by AI philosophers Nick Bostrom and Ray Kurzweil and entertains a cautious optimism about the future of humanity.
submitted by /u/Allisblissallislife
[link] [comments]
( 44
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
submitted by /u/bendee983
[link] [comments]
( 41
min )
submitted by /u/MedicMoth
[link] [comments]
( 43
min )
submitted by /u/zalivom1s
[link] [comments]
( 41
min )
Model tuning is the experimental process of finding the optimal parameters and configurations for a machine learning (ML) model that result in the best possible desired outcome with a validation dataset. Single objective optimization with a performance metric is the most common approach for tuning ML models. However, in addition to predictive performance, there may […]
( 12
min )
https://www.legoscript.com/these-companies-are-replacing-workers-with-chatgpt-
submitted by /u/pyactee
[link] [comments]
( 41
min )
As computing and AI advancements spanning decades are enabling incredible opportunities for people and society, they’re also raising questions about responsible development and deployment. For example, the machine learning models powering AI systems may not perform the same for everyone or every condition, potentially leading to harms related to safety, reliability, and fairness. Single metrics […]
The post Responsible AI: The research collaboration behind new open-source tools offered by Microsoft appeared first on Microsoft Research.
( 13
min )
There are a lot of chatbot-based apps that are basically internet text generators with a bit of introductory stage-setting to nudge the interaction into "user talks to helpful chatbot" as opposed to literally any other dialog on the web. Not surprisingly, these are susceptible to a user resetting
( 5
min )
AI Weirdness: the strange side of machine learning
( 2
min )
From scaling mountains in the annual California Death Ride bike challenge to creating a low-cost, open-source ventilator in the early days of the COVID-19 pandemic, NVIDIA Chief Scientist Bill Dally is no stranger to accomplishing near-impossible feats. On Friday, he achieved another rare milestone: induction into the Silicon Valley Engineering Council’s Hall of Fame. The Read article >
( 5
min )
Telcos are seeking industry-standard solutions that can run 5G, AI applications and immersive graphics workloads on the same server — including for computer vision and the metaverse. To meet this need, NVIDIA is developing a new AI-on-5G solution that combines 5G vRAN, edge AI and digital twin workloads on an all-in-one, hyperconverged and GPU-accelerated system. Read article >
( 5
min )
submitted by /u/0ut0flin3
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
I created two AI ChatGPT Wizards that rap battle based on topics in the twitch chat.
https://www.twitch.tv/fleetyfleet
submitted by /u/fleetisme
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/hoky777
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
Artificial intelligence (AI) is one of the most discussed technologies nowadays. It can alter how we live and work, yet there are concerns about its societal impact. In this blog post, we will look at the benefits and drawbacks of artificial intelligence.
https://preview.redd.it/1lix1lb2xjka1.png?width=820&format=png&auto=webp&s=a148bd1ea2376ca824648354d944a79e472bc010
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
Before
Original Image: https://i.ibb.co/2t1XdZQ/13er.jpg (By Getty Images)
https://preview.redd.it/v9m5ghbnwika1.png?width=1024&format=png&auto=webp&s=10ed97df02d8ba11c83bc332347a253d51a4e6c5
After
Version 1: https://i.ibb.co/ZYqP1LB/1903163b-ed82-4676-b220-84d194557ac3.jpg
https://preview.redd.it/qdj8pd6pwika1.png?width=1126&format=png&auto=webp&s=bb71ef58277518bd8cd2e53f800dece9a28c8330
Version 2: https://i.ibb.co/phqQK2g/ca4b8237-7986-461d-bf4c-3c47427f2be3.png
https://preview.redd.it/jsi8al6wwika1.png?width=1134&format=png&auto=webp&s=c6ef82b46d26a8776e045dc53b7bc0e5b0f0ec7f
My Question
These look good to u guys? Please feel free to give me some feedback. Thanks!
submitted by /u/Jealous_Ad8132
[link] [comments]
( 41
min )
submitted by /u/wyem
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/V1bicycle
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 43
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/cheekysalads123
[link] [comments]
( 41
min )
submitted by /u/regalalgorithm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Opitmus_Prime
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/shubhamorcapex
[link] [comments]
( 42
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/Quirky_Spirit_1951
[link] [comments]
( 43
min )
submitted by /u/NeonChat
[link] [comments]
( 42
min )
submitted by /u/Illustrious_Row_9971
[link] [comments]
( 43
min )
Hidden Markov Model implementation in R and Python for discrete and continuous observations. I have a tutorial on YouTube to explain about use and modeling of HMM and how to run these two packages.
Code:
https://github.com/manitadayon/CD_HMM (in R)
https://github.com/manitadayon/Auto_HMM (In Python)
Tutorial:
https://www.youtube.com/watch?v=1b-sd7gulFk&ab_channel=AIandMLFundamentals
https://www.youtube.com/watch?v=ieU8JFLRw2k&ab_channel=AIandMLFundamentals
submitted by /u/chess9145
[link] [comments]
( 43
min )
submitted by /u/taken_every_username
[link] [comments]
( 43
min )
submitted by /u/asdfsr125
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
Hi, sorry for the likely to be dumb question.. I'm relatively new to these topics.
I have a file containing rows with variable length and a class (defined by value 0 or 1).
Is it possible (and it makes sense?) to use k-nearest neighbors classifier to classify variable input lenght data? the file is something like this: https://gist.github.com/edoardottt/46dd13c60408e95c1685ee88b5f6ace8
Thanks!
submitted by /u/edoardottt
[link] [comments]
( 45
min )
To design with AI models, user experience (UX) designers must assess the fit
between the model and user needs. Based on user research, they need to
contextualize the model's behavior and potential failures within their
product-specific data instances and user scenarios. However, our formative
interviews with ten UX professionals revealed that such a proactive discovery
of model limitations is challenging and time-intensive. Furthermore, designers
often lack technical knowledge of AI and accessible exploration tools, which
challenges their understanding of model capabilities and limitations. In this
work, we introduced a failure-driven design approach to AI, a workflow that
encourages designers to explore model behavior and failure patterns early in
the design process. The implementation of fAIlureNotes, a designer-centered
failure exploration and analysis tool, supports designers in evaluating models
and identifying failures across diverse user groups and scenarios. Our
evaluation with UX practitioners shows that fAIlureNotes outperforms today's
interactive model cards in assessing context-specific model performance.
( 2
min )
Knowledge tracing (KT) serves as a primary part of intelligent education
systems. Most current KTs either rely on expert judgments or only exploit a
single network structure, which affects the full expression of learning
features. To adequately mine features of students' learning process, Deep
Knowledge Tracing Based on Spatial and Temporal Deep Representation Learning
for Learning Performance Prediction (DKT-STDRL) is proposed in this paper.
DKT-STDRL extracts spatial features from students' learning history sequence,
and then further extracts temporal features to extract deeper hidden
information. Specifically, firstly, the DKT-STDRL model uses CNN to extract the
spatial feature information of students' exercise sequences. Then, the spatial
features are connected with the original students' exercise features as joint
learning features. Then, the joint features are input into the BiLSTM part.
Finally, the BiLSTM part extracts the temporal features from the joint learning
features to obtain the prediction information of whether the students answer
correctly at the next time step. Experiments on the public education datasets
ASSISTment2009, ASSISTment2015, Synthetic-5, ASSISTchall, and Statics2011 prove
that DKT-STDRL can achieve better prediction effects than DKT and CKT.
( 2
min )
Despite their growing popularity, data-driven models of real-world dynamical
systems require lots of data. However, due to sensing limitations as well as
privacy concerns, this data is not always available, especially in domains such
as energy. Pre-trained models using data gathered in similar contexts have
shown enormous potential in addressing these concerns: they can improve
predictive accuracy at a much lower observational data expense. Theoretically,
due to the risk posed by negative transfer, this improvement is however neither
uniform for all agents nor is it guaranteed. In this paper, using data from
several distributed energy resources, we investigate and report preliminary
findings on several key questions in this regard. First, we evaluate the
improvement in predictive accuracy due to pre-trained models, both with and
without fine-tuning. Subsequently, we consider the question of fairness: do
pre-trained models create equal improvements for heterogeneous agents, and how
does this translate to downstream utility? Answering these questions can help
enable improvements in the creation, fine-tuning, and adoption of such
pre-trained models.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
Our goal is to produce methods for observational causal inference that are
auditable, easy to troubleshoot, yield accurate treatment effect estimates, and
scalable to high-dimensional data. We describe an almost-exact matching
approach that achieves these goals by (i) learning a distance metric via
outcome modeling, (ii) creating matched groups using the distance metric, and
(iii) using the matched groups to estimate treatment effects. Our proposed
method uses variable importance measurements to construct a distance metric,
making it a flexible method that can be adapted to various applications.
Concentrating on the scalability of the problem in the number of potential
confounders, we operationalize our approach with LASSO. We derive performance
guarantees for settings where LASSO outcome modeling consistently identifies
all confounders (importantly without requiring the linear model to be correctly
specified). We also provide experimental results demonstrating the auditability
of matches, as well as extensions to more general nonparametric outcome
modeling.
( 2
min )
Deep learning approaches require collection of data on many different input
features or variables for accurate model training and prediction. Since data
collection on input features could be costly, it is crucial to reduce the cost
by selecting a subset of features and developing a budget-constrained model
(BCM). In this paper, we introduce an approach to eliminating less important
features for big data analysis using Deep Neural Networks (DNNs). Once a DNN
model has been developed, we identify the weak links and weak neurons, and
remove some input features to bring the model cost within a given budget. The
experimental results show our approach is feasible and supports user selection
of a suitable BCM within a given budget.
( 2
min )
Deep networks are susceptible to numerous types of adversarial attacks.
Certified defenses provide guarantees on a model's robustness, but most of
these defenses are restricted to a single attack type. In contrast, this paper
proposes feature partition aggregation (FPA) - a certified defense against a
union of attack types, namely evasion, backdoor, and poisoning attacks. We
specifically consider an $\ell_0$ or sparse attacker that arbitrarily controls
an unknown subset of the training and test features - even across all
instances. FPA generates robustness guarantees via an ensemble whose submodels
are trained on disjoint feature sets. Following existing certified sparse
defenses, we generalize FPA's guarantees to top-$k$ predictions. FPA
significantly outperforms state-of-the-art sparse defenses providing larger and
stronger robustness guarantees, while simultaneously being up to
5,000${\times}$ faster.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
Deep learning is a crucial aspect of machine learning, but it also makes
these techniques vulnerable to adversarial examples, which can be seen in a
variety of applications. These examples can even be targeted at humans, leading
to the creation of false media, such as deepfakes, which are often used to
shape public opinion and damage the reputation of public figures. This article
will explore the concept of adversarial examples, which are comprised of
perturbations added to clean images or videos, and their ability to deceive DL
algorithms. The proposed approach achieved a precision value of accuracy of
76.2% on the DFDC dataset.
( 2
min )
Model parallelism is conventionally viewed as a method to scale a single
large deep learning model beyond the memory limits of a single device. In this
paper, we demonstrate that model parallelism can be additionally used for the
statistical multiplexing of multiple devices when serving multiple models, even
when a single model can fit into a single device. Our work reveals a
fundamental trade-off between the overhead introduced by model parallelism and
the opportunity to exploit statistical multiplexing to reduce serving latency
in the presence of bursty workloads. We explore the new trade-off space and
present a novel serving system, AlpaServe, that determines an efficient
strategy for placing and parallelizing collections of large deep learning
models across a distributed cluster. Evaluation results on production workloads
show that AlpaServe can process requests at up to 10x higher rates or 6x more
burstiness while staying within latency constraints for more than 99% of
requests.
( 2
min )
Explainable Artificial Intelligence (XAI) techniques are frequently required
by users in many AI systems with the goal of understanding complex models,
their associated predictions, and gaining trust. While suitable for some
specific tasks during development, their adoption by organisations to enhance
trust in machine learning systems has unintended consequences. In this paper we
discuss XAI's limitations in deployment and conclude that transparency
alongside with rigorous validation are better suited to gaining trust in AI
systems.
( 2
min )
Despite the popularity of low-rank matrix completion, the majority of its
theory has been developed under the assumption of random observation patterns,
whereas very little is known about the practically relevant case of non-random
patterns. Specifically, a fundamental yet largely open question is to describe
patterns that allow for unique or finitely many completions. This paper
provides two such families of patterns for any rank. A key to achieving this is
a novel formulation of low-rank matrix completion in terms of Plucker
coordinates, the latter a traditional tool in computer vision. This connection
is of potential significance to a wide family of matrix and subspace learning
problems with incomplete data.
( 2
min )
We study the statistical properties of learning to defer (L2D) to multiple
experts. In particular, we address the open problems of deriving a consistent
surrogate loss, confidence calibration, and principled ensembling of experts.
Firstly, we derive two consistent surrogates -- one based on a softmax
parameterization, the other on a one-vs-all (OvA) parameterization -- that are
analogous to the single expert losses proposed by Mozannar and Sontag (2020)
and Verma and Nalisnick (2022), respectively. We then study the frameworks'
ability to estimate P( m_j = y | x ), the probability that the jth expert will
correctly predict the label for x. Theory shows the softmax-based loss causes
mis-calibration to propagate between the estimates while the OvA-based loss
does not (though in practice, we find there are trade offs). Lastly, we propose
a conformal inference technique that chooses a subset of experts to query when
the system defers. We perform empirical validation on tasks for galaxy, skin
lesion, and hate speech classification.
( 2
min )
Randomly pivoted Cholesky (RPCholesky) is a natural algorithm for computing a
rank-k approximation of an N x N positive semidefinite (psd) matrix. RPCholesky
can be implemented with just a few lines of code. It requires only (k+1)N entry
evaluations and O(k^2 N) additional arithmetic operations. This paper offers
the first serious investigation of its experimental and theoretical behavior.
Empirically, RPCholesky matches or improves on the performance of alternative
algorithms for low-rank psd approximation. Furthermore, RPCholesky provably
achieves near-optimal approximation guarantees. The simplicity, effectiveness,
and robustness of this algorithm strongly support its use in scientific
computing and machine learning applications.
( 2
min )
Understanding when and how much a model gradient leaks information about the
training sample is an important question in privacy. In this paper, we present
a surprising result: even without training or memorizing the data, we can fully
reconstruct the training samples from a single gradient query at a randomly
chosen parameter value. We prove the identifiability of the training data under
mild conditions: with shallow or deep neural networks and a wide range of
activation functions. We also present a statistically and computationally
efficient algorithm based on tensor decomposition to reconstruct the training
data. As a provable attack that reveals sensitive training data, our findings
suggest potential severe threats to privacy, especially in federated learning.
( 2
min )
Bayesian Optimization is a useful tool for experiment design. Unfortunately,
the classical, sequential setting of Bayesian Optimization does not translate
well into laboratory experiments, for instance battery design, where
measurements may come from different sources and their evaluations may require
significant waiting times. Multi-fidelity Bayesian Optimization addresses the
setting with measurements from different sources. Asynchronous batch Bayesian
Optimization provides a framework to select new experiments before the results
of the prior experiments are revealed. This paper proposes an algorithm
combining multi-fidelity and asynchronous batch methods. We empirically study
the algorithm behavior, and show it can outperform single-fidelity batch
methods and multi-fidelity sequential methods. As an application, we consider
designing electrode materials for optimal performance in pouch cells using
experiments with coin cells to approximate battery performance.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/mothybot
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/Steve____Stifler
[link] [comments]
( 41
min )
submitted by /u/Otarih
[link] [comments]
( 41
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 41
min )
Happy Friday! Register now for a webinar we have coming up next Tuesday at 12PM ET: Architectures for Running ML at the Edge, presented by ODSC! Registration is free, sign up here.
In this webinar, we will explore different paradigms for edge deployment of ML models, including federated learning, cloud-edge hybrid architectures, and standalone edge models. We will discuss the trade-offs and considerations for each, as well as best practices for designing and deploying ML models at the edge.
Tune in Tuesday Feb. 28 @ 12PM ET. Register here.
submitted by /u/modzykirsten
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 42
min )
Hi guys,
I have made a video on YouTube here where I explain what gradient boosting is and how it works.
I hope it may be of use to some of you out there. As always, feedback is more than welcomed! :)
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Lumpek
[link] [comments]
( 41
min )
AI has the potential to revolutionize fraud detection by financial institutions, providing faster and more accurate detection of fraudulent activities. Here we present some ways in which AI can be used to detect and prevent fraud. https://youtu.be/luX9ecRwn_c
submitted by /u/eprepsg
[link] [comments]
( 41
min )
submitted by /u/awalias
[link] [comments]
( 41
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
https://twitter.com/GuillaumeLample/status/1629151231800115202?t=4cLD6Ko2Ld9Y3EIU72-M2g&s=19
Paper here - https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
submitted by /u/MysteryInc152
[link] [comments]
( 48
min )
Excited to share "Minds". A a new way to build backends and workflows entirely with AI (LLMs from OpenAI and Cohere). The AI can call your APIs, lookup in your database, etc.
With just a couple lines of code you can builds things like a question answering service where the AI can query your local database to help answer customer support queries etc.
https://github.com/dosco/minds
submitted by /u/gsvclass
[link] [comments]
( 43
min )
A recent podcast interview of EY's has gone a bit viral, and in it he claims that researchers have dismissed his views without seriously engaging with his arguments, which are described here in relative detail.
I'm aware of on-going AI safety and interpretability research, but the dual use of the term "AI safety" to mean something close to AI ethics, and something close to preventing an existential threat to humanity, makes distinguishing the goals of, say, Anthropic, and the extent to which they consider the latter a serious concern, difficult as a layperson.
I haven't personally found EY's arguments to be particularly rigorous, but I'm not the best suited person to evaluate their validity. Any thoughts are appreciated. Thanks in advance!
submitted by /u/SchmidhuberDidIt
[link] [comments]
( 44
min )
In this blog post we are discussing how to accelerate disaster response efforts using computer vision techniques for processing satellite imagery using AWS services.
( 8
min )
Amazon SageMaker multi-model endpoints (MMEs) provide a scalable and cost-effective way to deploy a large number of machine learning (ML) models. It gives you the ability to deploy multiple ML models in a single serving container behind a single endpoint. From there, SageMaker manages loading and unloading the models and scaling resources on your behalf […]
( 14
min )
Cloudy British weather is the butt of many jokes — but the United Kingdom’s national power grid is making the most of its sunshine. With the help of Open Climate Fix, a nonprofit product lab, the control room of the National Grid Electricity System Operator (ESO) is testing AI models that provide granular, near-term forecasts Read article >
( 6
min )
I am looking at OpenAI's implementation of SAC over here. Also, here is their code to compute the action and its log prob -
class SquashedGaussianMLPActor(nn.Module): def __init__(self, obs_dim, act_dim, hidden_sizes, activation, act_limit): super().__init__() self.net = mlp([obs_dim] + list(hidden_sizes), activation, activation) self.mu_layer = nn.Linear(hidden_sizes[-1], act_dim) self.log_std_layer = nn.Linear(hidden_sizes[-1], act_dim) self.act_limit = act_limit def forward(self, obs, deterministic=False, with_logprob=True): net_out = self.net(obs) mu = self.mu_layer(net_out) log_std = self.log_std_layer(net_out) log_std = torch.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX) std = torch.exp(log_std) # Pre-squash distribution and sample pi_distribution = Normal(mu, std) if deterministic: # O…
( 45
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kumarsaksham1891
[link] [comments]
( 41
min )
submitted by /u/CHRILLCAST
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/thedragod
[link] [comments]
( 41
min )
submitted by /u/tlokjock
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/sopmac21379
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 43
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
"Hotter take: ML would have advanced faster if another front-end language had been available and widely adopted instead of Python. One that is interactive yet fast & compilable, multithreaded (no GIL), isn't bloated, doesn't care about white spaces,... E.g. Julia or some Lisp."
Link from the original tweet
submitted by /u/Marcapiel
[link] [comments]
( 60
min )
Over the last 10 years, a number of players have developed autonomous vehicle (AV) systems using deep neural networks (DNNs). These systems have evolved from simple rule-based systems to Advanced Driver Assistance Systems (ADAS) and fully autonomous vehicles. These systems require petabytes of data and thousands of compute units (vCPUs and GPUs) to train. This […]
( 11
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
https://www.legoscript.com/we-will-die-if-not-careful
submitted by /u/pyactee
[link] [comments]
( 44
min )
submitted by /u/gwern
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 40
min )
Discover the top 5 uses of UI/UX design in 2023. Engage your users, increase conversion rates, and boost ROI with better user experiences.
The post Maximizing Business Success with UI/UX Design: The Top 5 Advantages appeared first on Data Science Central.
( 20
min )
The do-it-yourself climate modeling movement is here. Researchers from Northwestern University and Argonne National Laboratory have been launching NVIDIA Jetson-driven edge computing Waggle devices across the globe to collect hyper-local climate information. Waggle is an open source sensor platform for edge computing developed by Argonne. Working with this, scientists share open-source AI code designed for Read article >
( 6
min )
A million developers across the globe are now using the NVIDIA Jetson platform for edge AI and robotics to build innovative technologies. Plus, more than 6,000 companies — a third of which are startups — have integrated the platform with their products. These milestones and more will be celebrated during the NVIDIA Jetson Edge AI Read article >
( 6
min )
To drive the automotive industry forward, NVIDIA and Mercedes-Benz are taking the virtual road. NVIDIA founder and CEO Jensen Huang joined Mercedes-Benz CEO Ola Källenius on stage at the automaker’s strategy update event yesterday in Silicon Valley, showcasing progress in their landmark partnership to digitalize the entire product lifecycle, plus the ownership and automated driving Read article >
( 6
min )
The cloud just got bigger. NVIDIA and Microsoft announced this week they’re working to bring top PC Xbox Game Studios games to the GeForce NOW library, including titles from Bethesda, Mojang Studios and Activision, pending closure of Microsoft’s acquisition. With six new games joining the cloud this week for members to stream, it’s a jam-packed Read article >
( 5
min )
submitted by /u/GodGivenRx
[link] [comments]
( 40
min )
submitted by /u/timothy-ventura
[link] [comments]
( 41
min )
submitted by /u/Moneyguy2323
[link] [comments]
( 47
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/dcastm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/qptbook
[link] [comments]
( 40
min )
submitted by /u/TatianaW
[link] [comments]
( 41
min )
This post is co-written with Swagata Ashwani, Senior Data Scientist at Boomi. Boomi is an enterprise-level software as a service (SaaS) independent software vendor (ISV) that creates developer enablement tooling for software engineers. These tools integrate via API into Boomi’s core service offering. In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach […]
( 8
min )
"Deep learning is the only thing that currently works at scale it's the only class of algorithms that is able to discover arbitrary functions in a reasonable amount of time."
https://www.youtube.com/watch?v=p-OYPRhqRCg
I know of the universal approximation theorem. But is there any mathematical formulation of this statement?
submitted by /u/GraciousReformer
[link] [comments]
( 50
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/auto_mata
[link] [comments]
( 41
min )
Laptops equipped with NVIDIA GeForce RTX 4070, 4060 and 4050 GPUs are now available. The new lineup — including NVIDIA Studio-validated laptops from ASUS, GIGABYTE and Samsung — gives creators more options to create from anywhere with lighter, thinner devices that dramatically exceed the performance of the last generation.
( 8
min )
submitted by /u/jamesj
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
Similar to product explainer video like here: https://www.youtube.com/playlist?list=PL2P1Z-F3mmqxsMlpCp6wpeqAqlusiuZ_h
I've tried different services, but either the result was not good enough (e.g. Steve.ai has a "script to animation", but the result was very limited) or the service was not covering the case of script to video (e.g. https://www.synthesia.io/)
submitted by /u/muran123456
[link] [comments]
( 41
min )
submitted by /u/jaxsondeville
[link] [comments]
( 46
min )
submitted by /u/Machine_Minds
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
I have a lot of photos in my portfolio website and usually post them on social media by series like this example but want to find some new and creative ways to combine/curate photos in a different way which is visually appealing. And to come up with some new ideas outside of my own head I thought, maybe there is a tool that can help with ideas.
submitted by /u/Northlandscapes
[link] [comments]
( 41
min )
submitted by /u/TheRPGGamerMan
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/Reinfeldx
[link] [comments]
( 41
min )
submitted by /u/BeefarmRich
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 43
min )
After you build, train, and evaluate your machine learning (ML) model to ensure it’s solving the intended business problem proposed, you want to deploy that model to enable decision-making in business operations. Models that support business-critical functions are deployed to a production environment where a model release strategy is put in place. Given the nature […]
( 15
min )
We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision models used to create generative AI applications. Generative AI applications can perform a variety of tasks, including text summarization, answering questions, code generation, image creation, and writing essays and articles. AWS […]
( 4
min )
Announcements Data passivity and the current obsession with off-the-shelf chatbots Last September, Bill Schmarzo (“Point – Counterpoint on Why Organizations Suck at AI”) listed a few common excuses enterprises use to explain why they aren’t doing more with AI: We Don’t Have the Right Talent. “We can’t hire the right talent and don’t have bottomless budgets… Read More »DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots
The post DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots appeared first on Data Science Central.
( 20
min )
With every passing year, data analytics services are gaining more prominence as most enterprises are realizing the potential of data in driving important business decisions. The growing availability of data, developments in technology, and mounting demand for data-driven insights will contribute to this trend. Additionally, the upsurge of big data and cloud computing will make it easier… Read More »The Impact of AI-enabled Data Analytics Services Across Major Industries
The post The Impact of AI-enabled Data Analytics Services Across Major Industries appeared first on Data Science Central.
( 22
min )
Cybercriminals still attack startup businesses even though they may have smaller databases and less information to steal compared to the big players in the market. Why? Bad actors take the path of least resistance, and startups tend to be less equipped to defend against cyber attacks, spending an average of $500 or less on cybersecurity.… Read More »How to Build a Robust Cybersecurity Strategy for Your Startup
The post How to Build a Robust Cybersecurity Strategy for Your Startup appeared first on Data Science Central.
( 24
min )
The telecommunications industry has for decades helped advance revolutionary change – enabling everything from telephones and television to online streaming and self-driving cars. Yet the industry has long been considered an evolutionary mover in its own business. A recent survey of more than 400 telecommunications industry professionals from around the world found that same cautious Read article >
( 6
min )
Structural information of phylogenetic tree topologies plays an important
role in phylogenetic inference. However, finding appropriate topological
structures for specific phylogenetic inference tasks often requires significant
design effort and domain expertise. In this paper, we propose a novel
structural representation method for phylogenetic inference based on learnable
topological features. By combining the raw node features that minimize the
Dirichlet energy with modern graph representation learning techniques, our
learnable topological features can provide efficient structural information of
phylogenetic trees that automatically adapts to different downstream tasks
without requiring domain expertise. We demonstrate the effectiveness and
efficiency of our method on a simulated data tree probability estimation task
and a benchmark of challenging real data variational Bayesian phylogenetic
inference problems.
( 2
min )
We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular
adaptive (self-tuning) method for first-order stochastic optimization. Despite
being well studied, existing analyses of this method suffer from various
shortcomings: they either assume some knowledge of the problem parameters,
impose strong global Lipschitz conditions, or fail to give bounds that hold
with high probability. We provide a comprehensive analysis of this basic method
without any of these limitations, in both the convex and non-convex (smooth)
cases, that additionally supports a general ``affine variance'' noise model and
provides sharp rates of convergence in both the low-noise and
high-noise~regimes.
( 2
min )
In this paper, we investigate the impact of stochasticity and large stepsizes
on the implicit regularisation of gradient descent (GD) and stochastic gradient
descent (SGD) over diagonal linear networks. We prove the convergence of GD and
SGD with macroscopic stepsizes in an overparametrised regression setting and
characterise their solutions through an implicit regularisation problem. Our
crisp characterisation leads to qualitative insights about the impact of
stochasticity and stepsizes on the recovered solution. Specifically, we show
that large stepsizes consistently benefit SGD for sparse regression problems,
while they can hinder the recovery of sparse solutions for GD. These effects
are magnified for stepsizes in a tight window just below the divergence
threshold, in the ``edge of stability'' regime. Our findings are supported by
experimental results.
( 2
min )
We develop inductive biases for the machine learning of complex physical
systems based on the port-Hamiltonian formalism. To satisfy by construction the
principles of thermodynamics in the learned physics (conservation of energy,
non-negative entropy production), we modify accordingly the port-Hamiltonian
formalism so as to achieve a port-metriplectic one. We show that the
constructed networks are able to learn the physics of complex systems by parts,
thus alleviating the burden associated to the experimental characterization and
posterior learning process of this kind of systems. Predictions can be done,
however, at the scale of the complete system. Examples are shown on the
performance of the proposed technique.
( 2
min )
Federated learning (FL) is a privacy-preserving learning technique that
enables distributed computing devices to train shared learning models across
data silos collaboratively. Existing FL works mostly focus on designing
advanced FL algorithms to improve the model performance. However, the economic
considerations of the clients, such as fairness and incentive, are yet to be
fully explored. Without such considerations, self-motivated clients may lose
interest and leave the federation. To address this problem, we designed a novel
incentive mechanism that involves a client selection process to remove
low-quality clients and a money transfer process to ensure a fair reward
distribution. Our experimental results strongly demonstrate that the proposed
incentive mechanism can effectively improve the duration and fairness of the
federation.
( 2
min )
submitted by /u/TimeNeighborhood3869
[link] [comments]
( 40
min )
submitted by /u/thedragod
[link] [comments]
( 40
min )
submitted by /u/magenta_placenta
[link] [comments]
( 41
min )
submitted by /u/SanatanCharacters
[link] [comments]
( 40
min )
submitted by /u/freshthreadshop
[link] [comments]
( 40
min )
submitted by /u/Chisom1998_
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Economy_Vacation_761
[link] [comments]
( 40
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 40
min )
submitted by /u/cbsudux
[link] [comments]
( 40
min )
submitted by /u/SAT0725
[link] [comments]
( 40
min )
submitted by /u/supergroch
[link] [comments]
( 45
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
Hello everyone,
It's that time again, thank you all so much for the support you've given us over here. I've done a ton of typing this morning, so for a summary of what I've updated, you can see the higher-level twitter thread I wrote at https://twitter.com/hi_tysam/status/1627679672988319746?cxt=HHwWhIC-yb2C15YtAAAA, or the more detailed (but still rough cut) patch notes I wrote this morning at https://github.com/tysam-code/hlb-CIFAR10/releases/tag/v0.5.0
Happy to answer any questions anyone might have, cheers! :D :))))
submitted by /u/tysam_and_co
[link] [comments]
( 43
min )
submitted by /u/OK-I-will-try
[link] [comments]
( 41
min )
In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Stable Diffusion is a deep learning model that allows you to generate realistic, high-quality images and stunning art in just a few seconds. Although creating impressive images can find use in industries ranging from […]
( 18
min )
submitted by /u/Exciting-Company-75
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 40
min )
submitted by /u/Imagine-your-success
[link] [comments]
( 40
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 40
min )
submitted by /u/EIDANart
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 42
min )
https://avaturn.me/
submitted by /u/theaiguru
[link] [comments]
( 40
min )
submitted by /u/Knight_Fisher61
[link] [comments]
( 40
min )
submitted by /u/walt74
[link] [comments]
( 47
min )
submitted by /u/motivationinsta
[link] [comments]
( 40
min )
submitted by /u/slavaMZ
[link] [comments]
( 41
min )
submitted by /u/cobalt1137
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
I have written a blog post explaining the Barlow Twins paper from Meta AI. Can you guys have a read and provide suggestions to improve it further? Thanks in advance!
https://pmgautam.com/posts/barlow-twins-explanation.html
submitted by /u/pmgautam_
[link] [comments]
( 42
min )
submitted by /u/MysteryInc152
[link] [comments]
( 42
min )
I am following this implementation of ddpg and found this code -
self.linear3.weight.data.uniform_(-init_w, init_w)
]It seems like the author is forcing the weights of the final layer to follow a uniform distribution.
Why is the author only replacing the final layer weights?
How does uniform weight initialization help?
I have heard a lot about the usefulness of Orthogonal initialization. This is the first time, I have seen the above type of initialization.
submitted by /u/Academic-Rent7800
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/hayAbhay
[link] [comments]
( 44
min )
submitted by /u/Lumpek
[link] [comments]
( 40
min )
submitted by /u/Lukmin1999
[link] [comments]
( 42
min )
submitted by /u/timothy-ventura
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 40
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 40
min )
submitted by /u/OnlyProggingForFun
[link] [comments]
( 41
min )
submitted by /u/jrstelle
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Philo167
[link] [comments]
( 40
min )
submitted by /u/ssigea
[link] [comments]
( 42
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 42
min )
submitted by /u/malirkan
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 40
min )
submitted by /u/Reddit_Anon22
[link] [comments]
( 43
min )
This note describes a new approach to classifying graphs that leverages graph
generative models (GGM). Assuming a GGM that defines a joint probability
distribution over graphs and their class labels, I derive classification
formulas for the probability of a class label given a graph. A new conditional
ELBO can be used to train a generative graph auto-encoder model for
discrimination. While leveraging generative models for classification has been
well explored for non-relational i.i.d. data, to our knowledge it is a novel
approach to graph classification.
( 2
min )
This work explains in detail the theory behind Complex-Valued Neural Network
(CVNN), including Wirtinger calculus, complex backpropagation, and basic
modules such as complex layers, complex activation functions, or complex weight
initialization. We also show the impact of not adapting the weight
initialization correctly to the complex domain. This work presents a strong
focus on the implementation of such modules on Python using cvnn toolbox. We
also perform simulations on real-valued data, casting to the complex domain by
means of the Hilbert Transform, and verifying the potential interest of CVNN
even for non-complex data.
( 2
min )
Lung cancer is the leading cause of death among different types of cancers.
Every year, the lives lost due to lung cancer exceed those lost to pancreatic,
breast, and prostate cancer combined. The survival rate for lung cancer
patients is very low compared to other cancer patients due to late diagnostics.
Thus, early lung cancer diagnostics is crucial for patients to receive early
treatments, increasing the survival rate or even becoming cancer-free. This
paper proposed a deep-learning model for early lung cancer prediction and
diagnosis from Computed Tomography (CT) scans. The proposed mode achieves high
accuracy. In addition, it can be a beneficial tool to support radiologists'
decisions in predicting and detecting lung cancer and its stage.
( 2
min )
Graph neural networks (GNNs) are able to leverage the structure of graph data
by passing messages along the edges of the graph. While this allows GNNs to
learn features depending on the graph structure, for certain graph topologies
it leads to inefficient information propagation and a problem known as
oversquashing. This has recently been linked with the curvature and spectral
gap of the graph. On the other hand, adding edges to the message-passing graph
can lead to increasingly similar node representations and a problem known as
oversmoothing. We propose a computationally efficient algorithm that prevents
oversquashing by systematically adding edges to the graph based on spectral
expansion. We combine this with a relational architecture, which lets the GNN
preserve the original graph structure and provably prevents oversmoothing. We
find experimentally that our algorithm outperforms existing graph rewiring
methods in several graph classification tasks.
( 2
min )
In this work, we propose a zero-shot voice conversion method using speech
representations trained with self-supervised learning. First, we develop a
multi-task model to decompose a speech utterance into features such as
linguistic content, speaker characteristics, and speaking style. To disentangle
content and speaker representations, we propose a training strategy based on
Siamese networks that encourages similarity between the content representations
of the original and pitch-shifted audio. Next, we develop a synthesis model
with pitch and duration predictors that can effectively reconstruct the speech
signal from its decomposed representation. Our framework allows controllable
and speaker-adaptive synthesis to perform zero-shot any-to-any voice conversion
achieving state-of-the-art results on metrics evaluating speaker similarity,
intelligibility, and naturalness. Using just 10 seconds of data for a target
speaker, our framework can perform voice swapping and achieves a speaker
verification EER of 5.5% for seen speakers and 8.4% for unseen speakers.
( 2
min )
The increasing application of Artificial Intelligence and Machine Learning
models poses potential risks of unfair behavior and, in light of recent
regulations, has attracted the attention of the research community. Several
researchers focused on seeking new fairness definitions or developing
approaches to identify biased predictions. However, none try to exploit the
counterfactual space to this aim. In that direction, the methodology proposed
in this work aims to unveil unfair model behaviors using counterfactual
reasoning in the case of fairness under unawareness setting. A counterfactual
version of equal opportunity named counterfactual fair opportunity is defined
and two novel metrics that analyze the sensitive information of counterfactual
samples are introduced. Experimental results on three different datasets show
the efficacy of our methodologies and our metrics, disclosing the unfair
behavior of classic machine learning and debiasing models.
( 2
min )
Spherical harmonics provide a smooth, orthogonal, and symmetry-adapted basis
to expand functions on a sphere, and they are used routinely in computer
graphics, signal processing and different fields of science, from geology to
quantum chemistry. More recently, spherical harmonics have become a key
component of rotationally equivariant models for geometric deep learning, where
they are used in combination with distance-dependent functions to describe the
distribution of neighbors within local spherical environments within a point
cloud. We present a fast and elegant algorithm for the evaluation of the
real-valued spherical harmonics. Our construction integrates many of the
desirable features of existing schemes and allows to compute Cartesian
derivatives in a numerically stable and computationally efficient manner. We
provide an efficient C implementation of the proposed algorithm, along with
easy-to-use Python bindings.
( 2
min )
We present Trieste, an open-source Python package for Bayesian optimization
and active learning benefiting from the scalability and efficiency of
TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based
models within sequential decision-making loops, e.g. Gaussian processes from
GPflow or GPflux, or neural networks from Keras. This modular mindset is
central to the package and extends to our acquisition functions and the
internal dynamics of the decision-making loop, both of which can be tailored
and extended by researchers or engineers when tackling custom use cases.
Trieste is a research-friendly and production-ready toolkit backed by a
comprehensive test suite, extensive documentation, and available at
https://github.com/secondmind-labs/trieste.
( 2
min )
In this work we developed a deep learning technique that successfully solves
a non-linear dynamic control problem. Instead of directly tackling the control
problem, we combined methods in probabilistic neural networks and a
Kalman-Filter-inspired model to build a non-linear state estimator for the
system. We then used the estimated states to implement a trivial controller for
the now fully observable system. We applied this technique to a crucial
non-linear control problem that arises in the operation of the LIGO system, an
interferometric gravitational-wave observatory. We demonstrated in simulation
that our approach can learn from data to estimate the state of the system,
allowing a successful control of the interferometer's mirror . We also
developed a computationally efficient model that can run in real time at high
sampling rate on a single modern CPU core, one of the key requirements for the
implementation of our solution in the LIGO digital control system. We believe
these techniques could be used to help tackle similar non-linear control
problems in other applications.
( 2
min )
Robotics, automation, and related Artificial Intelligence (AI) systems have
become pervasive bringing in concerns related to security, safety, accuracy,
and trust. With growing dependency on physical robots that work in close
proximity to humans, the security of these systems is becoming increasingly
important to prevent cyber-attacks that could lead to privacy invasion,
critical operations sabotage, and bodily harm. The current shortfall of
professionals who can defend such systems demands development and integration
of such a curriculum. This course description includes details about seven
self-contained and adaptive modules on "AI security threats against pervasive
robotic systems". Topics include: 1) Introduction, examples of attacks, and
motivation; 2) - Robotic AI attack surfaces and penetration testing; 3) -
Attack patterns and security strategies for input sensors; 4) - Training
attacks and associated security strategies; 5) - Inference attacks and
associated security strategies; 6) - Actuator attacks and associated security
strategies; and 7) - Ethics of AI, robotics, and cybersecurity.
( 2
min )
Decentralised Machine Learning (DML) enables collaborative machine learning
without centralised input data. Federated Learning (FL) and Edge Inference are
examples of DML. While tools for DML (especially FL) are starting to flourish,
many are not flexible and portable enough to experiment with novel systems
(e.g., RISC-V), non-fully connected topologies, and asynchronous collaboration
schemes. We overcome these limitations via a domain-specific language allowing
to map DML schemes to an underlying middleware, i.e. the \ff parallel
programming library. We experiment with it by generating different working DML
schemes on two emerging architectures (ARM-v8, RISC-V) and the x86-64 platform.
We characterise the performance and energy efficiency of the presented schemes
and systems. As a byproduct, we introduce a RISC-V porting of the PyTorch
framework, the first publicly available to our knowledge.
( 2
min )
This paper considers the use of recently proposed optimal transport-based
multivariate test statistics, namely rank energy and its variant the soft rank
energy derived from entropically regularized optimal transport, for the
unsupervised nonparametric change point detection (CPD) problem. We show that
the soft rank energy enjoys both fast rates of statistical convergence and
robust continuity properties which lead to strong performance on real datasets.
Our theoretical analyses remove the need for resampling and out-of-sample
extensions previously required to obtain such rates. In contrast the rank
energy suffers from the curse of dimensionality in statistical estimation and
moreover can signal a change point from arbitrarily small perturbations, which
leads to a high rate of false alarms in CPD. Additionally, under mild
regularity conditions, we quantify the discrepancy between soft rank energy and
rank energy in terms of the regularization parameter. Finally, we show our
approach performs favorably in numerical experiments compared to several other
optimal transport-based methods as well as maximum mean discrepancy.
( 2
min )
We consider the problem of testing the identity of a reversible Markov chain
against a reference from a single trajectory of observations. Employing the
recently introduced notion of a lumping-congruent Markov embedding, we show
that, at least in a mildly restricted setting, testing identity to a reversible
chain reduces to testing to a symmetric chain over a larger state space and
recover state-of-the-art sample complexity for the problem.
( 2
min )
This work explains in detail the theory behind Complex-Valued Neural Network
(CVNN), including Wirtinger calculus, complex backpropagation, and basic
modules such as complex layers, complex activation functions, or complex weight
initialization. We also show the impact of not adapting the weight
initialization correctly to the complex domain. This work presents a strong
focus on the implementation of such modules on Python using cvnn toolbox. We
also perform simulations on real-valued data, casting to the complex domain by
means of the Hilbert Transform, and verifying the potential interest of CVNN
even for non-complex data.
( 2
min )
Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been
proposed and studied, but these risks are all at least as sensitive as the mean
to loss tails on the upside, and tend to ignore deviations on the downside. We
study a complementary new risk class that penalizes loss deviations in a
bi-directional manner, while having more flexibility in terms of tail
sensitivity than is offered by mean-variance. This class lets us derive
high-probability learning guarantees without explicit gradient clipping, and
empirical tests using both simulated and real data illustrate a high degree of
control over key properties of the test loss distribution incurred by
gradient-based learners.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
This paper considers the use of recently proposed optimal transport-based
multivariate test statistics, namely rank energy and its variant the soft rank
energy derived from entropically regularized optimal transport, for the
unsupervised nonparametric change point detection (CPD) problem. We show that
the soft rank energy enjoys both fast rates of statistical convergence and
robust continuity properties which lead to strong performance on real datasets.
Our theoretical analyses remove the need for resampling and out-of-sample
extensions previously required to obtain such rates. In contrast the rank
energy suffers from the curse of dimensionality in statistical estimation and
moreover can signal a change point from arbitrarily small perturbations, which
leads to a high rate of false alarms in CPD. Additionally, under mild
regularity conditions, we quantify the discrepancy between soft rank energy and
rank energy in terms of the regularization parameter. Finally, we show our
approach performs favorably in numerical experiments compared to several other
optimal transport-based methods as well as maximum mean discrepancy.
( 2
min )
I have constructed a novel ML (NLP) dataset for classification and labeled it with three classes. The dataset is rather small with about 700 examples, out of which the classes have about 400, 200, and 100 examples respectively. I would like to publish it and describe it in an official publication for a workshop or a conference.
When looking at related datasets and publication, I see that it is common for authors to publish the dataset already split into three chunks - train, dev, test dataset (see the images). It is also common in these papers to provide the performance of baseline models on the dataset. Considering the dataset's small size, I feel like doing a 5-fold cross-validation would be a good alternative for such a small dataset, rather than doing something like a split into 450-1…
( 46
min )
submitted by /u/AlternativeFee1
[link] [comments]
( 41
min )
submitted by /u/kiabarocha
[link] [comments]
( 40
min )
submitted by /u/DunMiff--Sys
[link] [comments]
( 41
min )
Do you think AI will be able to give trustable advice in the future?
Doing research for a school project.If you have the time I would appreciate it if you could fill this form out.
https://forms.gle/X7Fg8cQsqWb278bm7
View Poll
submitted by /u/Jakets_V
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 40
min )
The more significant ChatGPT usage is becoming, the more concerns the tool is raising.
What do you think: is it an incredible source of inspiration or the death of art as we know it?
Would you be able to distinguish between AI-generated text and human poetry?
Take part in the experiment and share your thoughts here: ChatGPT Survey.
submitted by /u/Lonely-Wish-6377
[link] [comments]
( 41
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 40
min )
submitted by /u/LightOfAntara
[link] [comments]
( 40
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 40
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/DunMiff--Sys
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 43
min )
North American reinforcement materials market is anticipated to display revenue growth at a CAGR of 5.64% by 2028. Get free sample report
North America Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Middle East and Africa reinforcement materials market is probable to grow as per projected to witness growth at a CAGR of 5.13% by 2028. Get free sample report
Middle East and Africa Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Europe’s reinforcement materials market is likely to register growth at a CAGR of 5.87% based on revenue during the period 2021-2028. Get free sample report
Europe Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Asia-Pacific reinforcement materials market is assessed to display growth at 6.33% of CAGR in the forecasting years 2021-2028. Get free sample report
Asia-Pacific Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
The Global Reinforcement Materials Market is estimated to grow at a CAGR of 6.02%, and is likely to garner $12826 million by 2028. Get a Free Sample Report
Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
We consider the optimal sample complexity theory of tabular reinforcement
learning (RL) for controlling the infinite horizon discounted reward in a
Markov decision process (MDP). Optimal min-max complexity results have been
developed for tabular RL in this setting, leading to a sample complexity
dependence on $\gamma$ and $\epsilon$ of the form $\tilde
\Theta((1-\gamma)^{-3}\epsilon^{-2})$, where $\gamma$ is the discount factor
and $\epsilon$ is the tolerance solution error. However, in many applications
of interest, the optimal policy (or all policies) will induce mixing. We show
that in these settings the optimal min-max complexity is $\tilde
\Theta(t_{\text{minorize}}(1-\gamma)^{-2}\epsilon^{-2})$, where
$t_{\text{minorize}}$ is a measure of mixing that is within an equivalent
factor of the total variation mixing time. Our analysis is based on
regeneration-type ideas, that, we believe are of independent interest since
they can be used to study related problems for general state space MDPs.
( 2
min )
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.
( 2
min )
We prove that various stochastic gradient descent methods, including the
stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic
Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict
saddle manifold. To the best of our knowledge, this is the first time such
results are obtained for SHB and SNAG methods. Moreover, our analysis expands
upon previous studies on SGD by removing the need for bounded gradients of the
objective function and uniformly bounded noise. Instead, we introduce a more
practical local boundedness assumption for the noisy gradient, which is
naturally satisfied in empirical risk minimization problems typically seen in
training of neural networks.
( 2
min )
Mitigating the discrimination of machine learning models has gained
increasing attention in medical image analysis. However, rare works focus on
fair treatments for patients with multiple sensitive demographic ones, which is
a crucial yet challenging problem for real-world clinical applications. In this
paper, we propose a novel method for fair representation learning with respect
to multi-sensitive attributes. We pursue the independence between target and
multi-sensitive representations by achieving orthogonality in the
representation space. Concretely, we enforce the column space orthogonality by
keeping target information on the complement of a low-rank sensitive space.
Furthermore, in the row space, we encourage feature dimensions between target
and sensitive representations to be orthogonal. The effectiveness of the
proposed method is demonstrated with extensive experiments on the CheXpert
dataset. To our best knowledge, this is the first work to mitigate unfairness
with respect to multiple sensitive attributes in the field of medical imaging.
( 2
min )
We present a new convolution layer for deep learning architectures which we
call QuadConv -- an approximation to continuous convolution via quadrature. Our
operator is developed explicitly for use on non-uniform, mesh-based data, and
accomplishes this by learning a continuous kernel that can be sampled at
arbitrary locations. Moreover, the construction of our operator admits an
efficient implementation which we detail and construct. In the setting of
compressing data arising from partial differential equation (PDE) simulations,
we show that QuadConv can match the performance of standard discrete
convolutions on uniform grid data by comparing a QuadConv autoencoder (QCAE) to
a standard convolutional autoencoder (CAE). Further, we show that the QCAE can
maintain this accuracy even on non-uniform data.
( 2
min )
A current goal in the graph neural network literature is to enable
transformers to operate on graph-structured data, given their success on
language and vision tasks. Since the transformer's original sinusoidal
positional encodings (PEs) are not applicable to graphs, recent work has
focused on developing graph PEs, rooted in spectral graph theory or various
spatial features of a graph. In this work, we introduce a new graph PE, Graph
Automaton PE (GAPE), based on weighted graph-walking automata (a novel
extension of graph-walking automata). We compare the performance of GAPE with
other PE schemes on both machine translation and graph-structured tasks, and we
show that it generalizes several other PEs. An additional contribution of this
study is a theoretical and controlled experimental comparison of many recent
PEs in graph transformers, independent of the use of edge features.
( 2
min )
Molecular conformation generation (MCG) is a fundamental and important
problem in drug discovery. Many traditional methods have been developed to
solve the MCG problem, such as systematic searching, model-building, random
searching, distance geometry, molecular dynamics, Monte Carlo methods, etc.
However, they have some limitations depending on the molecular structures.
Recently, there are plenty of deep learning based MCG methods, which claim they
largely outperform the traditional methods. However, to our surprise, we design
a simple and cheap algorithm (parameter-free) based on the traditional methods
and find it is comparable to or even outperforms deep learning based MCG
methods in the widely used GEOM-QM9 and GEOM-Drugs benchmarks. In particular,
our design algorithm is simply the clustering of the RDKIT-generated
conformations. We hope our findings can help the community to revise the deep
learning methods for MCG. The code of the proposed algorithm could be found at
https://gist.github.com/ZhouGengmo/5b565f51adafcd911c0bc115b2ef027c.
( 2
min )
Contrastive learning is a powerful framework for learning self-supervised
representations that generalize well to downstream supervised tasks. We show
that multiple existing contrastive learning methods can be reinterpreted as
learning kernel functions that approximate a fixed positive-pair kernel. We
then prove that a simple representation obtained by combining this kernel with
PCA provably minimizes the worst-case approximation error of linear predictors,
under a straightforward assumption that positive pairs have similar labels. Our
analysis is based on a decomposition of the target function in terms of the
eigenfunctions of a positive-pair Markov chain, and a surprising equivalence
between these eigenfunctions and the output of Kernel PCA. We give
generalization bounds for downstream linear prediction using our Kernel PCA
representation, and show empirically on a set of synthetic tasks that applying
Kernel PCA to contrastive learning models can indeed approximately recover the
Markov chain eigenfunctions, although the accuracy depends on the kernel
parameterization as well as on the augmentation strength.
( 2
min )
Tongue twisters are meaningful sentences that are difficult to pronounce. The
process of automatically generating tongue twisters is challenging since the
generated utterance must satisfy two conditions at once: phonetic difficulty
and semantic meaning. Furthermore, phonetic difficulty is itself hard to
characterize and is expressed in natural tongue twisters through a
heterogeneous mix of phenomena such as alliteration and homophony. In this
paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue
Twisters Automatically. We leverage phoneme representations to capture the
notion of phonetic difficulty, and we train language models to generate
original tongue twisters on two proposed task settings. To do this, we curate a
dataset called PANCETTA, consisting of existing English tongue twisters.
Through automatic and human evaluation, as well as qualitative analysis, we
show that PANCETTA generates novel, phonetically difficult, fluent, and
semantically meaningful tongue twisters.
( 2
min )
The Baum-Welch (B-W) algorithm is the most widely accepted method for
inferring hidden Markov models (HMM). However, it is prone to getting stuck in
local optima, and can be too slow for many real-time applications. Spectral
learning of HMMs (SHMMs), based on the method of moments (MOM) has been
proposed in the literature to overcome these obstacles. Despite its promises,
asymptotic theory for SHMM has been elusive, and the long-run performance of
SHMM can degrade due to unchecked propogation of error. In this paper, we (1)
provide an asymptotic distribution for the approximate error of the likelihood
estimated by SHMM, and (2) propose a novel algorithm called projected SHMM
(PSHMM) that mitigates the problem of error propogation, and (3) develop online
learning variantions of both SHMM and PSHMM that accommodate potential
nonstationarity. We compare the performance of SHMM with PSHMM and estimation
through the B-W algorithm on both simulated data and data from real world
applications, and find that PSHMM not only retains the computational advantages
of SHMM, but also provides more robust estimation and forecasting.
( 2
min )
Arunachalam and De Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
We propose to explore the potential of physics-informed neural networks
(PINNs) in solving a class of partial differential equations (PDEs) used to
model the propagation of chronic inflammatory bowel diseases, such as Crohn's
disease and ulcerative colitis. An unsupervised approach was privileged during
the deep neural network training. Given the complexity of the underlying
biological system, characterized by intricate feedback loops and limited
availability of high-quality data, the aim of this study is to explore the
potential of PINNs in solving PDEs. In addition to providing this exploratory
assessment, we also aim to emphasize the principles of reproducibility and
transparency in our approach, with a specific focus on ensuring the robustness
and generalizability through the use of artificial intelligence. We will
quantify the relevance of the PINN method with several linear and non-linear
PDEs in relation to biology. However, it is important to note that the final
solution is dependent on the initial conditions, chosen boundary conditions,
and neural network architectures.
( 2
min )
Gamma-Phi losses constitute a family of multiclass classification loss
functions that generalize the logistic and other common losses, and have found
application in the boosting literature. We establish the first general
sufficient condition for the classification-calibration of such losses. In
addition, we show that a previously proposed sufficient condition is in fact
not sufficient.
( 2
min )
The Baum-Welch (B-W) algorithm is the most widely accepted method for
inferring hidden Markov models (HMM). However, it is prone to getting stuck in
local optima, and can be too slow for many real-time applications. Spectral
learning of HMMs (SHMMs), based on the method of moments (MOM) has been
proposed in the literature to overcome these obstacles. Despite its promises,
asymptotic theory for SHMM has been elusive, and the long-run performance of
SHMM can degrade due to unchecked propogation of error. In this paper, we (1)
provide an asymptotic distribution for the approximate error of the likelihood
estimated by SHMM, and (2) propose a novel algorithm called projected SHMM
(PSHMM) that mitigates the problem of error propogation, and (3) develop online
learning variantions of both SHMM and PSHMM that accommodate potential
nonstationarity. We compare the performance of SHMM with PSHMM and estimation
through the B-W algorithm on both simulated data and data from real world
applications, and find that PSHMM not only retains the computational advantages
of SHMM, but also provides more robust estimation and forecasting.
( 2
min )
Dimensionality reduction (DR) plays a vital role in the visual analysis of
high-dimensional data. One main aim of DR is to reveal hidden patterns that lie
on intrinsic low-dimensional manifolds. However, DR often overlooks important
patterns when the manifolds are distorted or masked by certain influential data
attributes. This paper presents a feature learning framework, FEALM, designed
to generate a set of optimized data projections for nonlinear DR in order to
capture important patterns in the hidden manifolds. These projections produce
maximally different nearest-neighbor graphs so that resultant DR outcomes are
significantly different. To achieve such a capability, we design an
optimization algorithm as well as introduce a new graph dissimilarity measure,
named neighbor-shape dissimilarity. Additionally, we develop interactive
visualizations to assist comparison of obtained DR results and interpretation
of each DR result. We demonstrate FEALM's effectiveness through experiments and
case studies using synthetic and real-world datasets.
( 2
min )
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.
( 2
min )
We consider the optimal sample complexity theory of tabular reinforcement
learning (RL) for controlling the infinite horizon discounted reward in a
Markov decision process (MDP). Optimal min-max complexity results have been
developed for tabular RL in this setting, leading to a sample complexity
dependence on $\gamma$ and $\epsilon$ of the form $\tilde
\Theta((1-\gamma)^{-3}\epsilon^{-2})$, where $\gamma$ is the discount factor
and $\epsilon$ is the tolerance solution error. However, in many applications
of interest, the optimal policy (or all policies) will induce mixing. We show
that in these settings the optimal min-max complexity is $\tilde
\Theta(t_{\text{minorize}}(1-\gamma)^{-2}\epsilon^{-2})$, where
$t_{\text{minorize}}$ is a measure of mixing that is within an equivalent
factor of the total variation mixing time. Our analysis is based on
regeneration-type ideas, that, we believe are of independent interest since
they can be used to study related problems for general state space MDPs.
( 2
min )
Semi-supervised learning is a powerful technique for leveraging unlabeled
data to improve machine learning models, but it can be affected by the presence
of ``informative'' labels, which occur when some classes are more likely to be
labeled than others. In the missing data literature, such labels are called
missing not at random. In this paper, we propose a novel approach to address
this issue by estimating the missing-data mechanism and using inverse
propensity weighting to debias any SSL algorithm, including those using data
augmentation. We also propose a likelihood ratio test to assess whether or not
labels are indeed informative. Finally, we demonstrate the performance of the
proposed methods on different datasets, in particular on two medical datasets
for which we design pseudo-realistic missing data scenarios.
( 2
min )
In this paper, we propose a model-free feature selection method for
ultra-high dimensional data with mass features. This is a two phases procedure
that we propose to use the fused Kolmogorov filter with the random forest based
RFE to remove model limitations and reduce the computational complexity. The
method is fully nonparametric and can work with various types of datasets. It
has several appealing characteristics, i.e., accuracy, model-free, and
computational efficiency, and can be widely used in practical problems, such as
multiclass classification, nonparametric regression, and Poisson regression,
among others. We show that the proposed method is selection consistent and
$L_2$ consistent under weak regularity conditions. We further demonstrate the
superior performance of the proposed method over other existing methods by
simulations and real data examples.
( 2
min )
Arunachalam and De Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
Gamma-Phi losses constitute a family of multiclass classification loss
functions that generalize the logistic and other common losses, and have found
application in the boosting literature. We establish the first general
sufficient condition for the classification-calibration of such losses. In
addition, we show that a previously proposed sufficient condition is in fact
not sufficient.
( 2
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/LazyHighGoals
[link] [comments]
( 41
min )
submitted by /u/red3vil96
[link] [comments]
( 41
min )
submitted by /u/Blake_Jonesy
[link] [comments]
( 41
min )
submitted by /u/LorestForest
[link] [comments]
( 42
min )
submitted by /u/ThatManulTheCat
[link] [comments]
( 40
min )
submitted by /u/Number_5_alive
[link] [comments]
( 40
min )
submitted by /u/Calatravo
[link] [comments]
( 40
min )
submitted by /u/citizentim
[link] [comments]
( 40
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 41
min )
Hi everyone, I used several machine vision algorithms to determine the fastest lane on border crossings. I have worked on this for the past few months and would love to know what you think about it. You can check out the detailed steps and code on the medium article in this link.
submitted by /u/andrea_m2000
[link] [comments]
( 41
min )
submitted by /u/globeworldmap
[link] [comments]
( 41
min )
submitted by /u/punkthesystem
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/Meta-Stark
[link] [comments]
( 40
min )
submitted by /u/SAT0725
[link] [comments]
( 44
min )
submitted by /u/Chipdoc
[link] [comments]
( 42
min )
Modern model pre-training often calls for larger cluster deployment to reduce time and cost. At the server level, such training workloads demand faster compute and increased memory allocation. As models grow to hundreds of billions of parameters, they require a distributed training mechanism that spans multiple nodes (instances). In October 2022, we launched Amazon EC2 […]
( 10
min )
Enterprises across the globe are looking to utilize multiple data sources to implement a unified search experience for their employees and end customers. Considering the large volume of data that needs to be examined and indexed, the retrieval speed, solution scalability, and search performance become key factors to consider when choosing an enterprise intelligent search […]
( 7
min )
Novel AI technologies are generating images, stories and, now, new ways to imagine the automotive future. At NVIDIA GTC, a global conference for the era of AI and the metaverse running online March 20-23, industry luminaries working on these breakthroughs will come together and share their visions to transform transportation. This year’s slate of in-depth Read article >
( 5
min )
The video above represents one of the first times that a pangolin, one of the world’s most critically endangered species, was detected in real time using artificial intelligence. A U.K.-based nonprofit called Conservation AI made this possible with the help of NVIDIA technology. Such use of AI can help track even the rarest, most reclusive Read article >
( 7
min )
Fellow Hunters, get ready! This GFN Thursday welcomes Capcom’s Monster Hunter Rise and the expansion Sunbreak to the cloud, arriving soon for members. Settle down for the weekend with 10 new games supported in the GeForce NOW library, including The Settlers: New Allies. Plus, Amsterdam and Ashburn are next to light up on the RTX Read article >
( 5
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
We’re clarifying how ChatGPT's behavior is shaped and our plans for improving that behavior, allowing more user customization, and getting more public input into our decision-making in these areas.
OpenAI’s mission is to ensure that artificial general intelligence (AGI)[1] benefits all of humanity.
( 6
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )